Tasks 12-14: Related work, citations, complementary tool links
Task 12: Add Related Work section (Section 21) to methodology covering EcoLogits, CodeCarbon, AI Energy Score, Green Algorithms, Google/Jegham published data, UNICC framework, and social cost research. Task 13: Add specific citations and links for cognitive deskilling (CHI 2025, Springer 2025, endoscopy study), linguistic homogenization (UNESCO), and algorithmic monoculture (Stanford HAI). Task 14: Add Related Tools section to toolkit README linking EcoLogits, CodeCarbon, and AI Energy Score. Also updated toolkit energy values to match calibrated methodology.
This commit is contained in:
parent
9653f69860
commit
c619c31caf
2 changed files with 108 additions and 20 deletions
|
|
@ -441,12 +441,16 @@ livelihoods) depends on the economic context and is genuinely uncertain.
|
||||||
|
|
||||||
### Cognitive deskilling
|
### Cognitive deskilling
|
||||||
|
|
||||||
A Microsoft/CHI 2025 study found that higher confidence in GenAI
|
A Microsoft/CMU study (Lee et al., CHI 2025) found that higher
|
||||||
correlates with less critical thinking effort. An MIT Media Lab study
|
confidence in GenAI correlates with less critical thinking effort
|
||||||
("Your Brain on ChatGPT") documented "cognitive debt" — users who relied
|
([ACM DL](https://dl.acm.org/doi/full/10.1145/3706598.3713778)). An
|
||||||
on AI for tasks performed worse when later working independently. Clinical
|
MIT Media Lab study ("Your Brain on ChatGPT") documented "cognitive
|
||||||
evidence shows that clinicians relying on AI diagnostics saw measurable
|
debt" — users who relied on AI for tasks performed worse when later
|
||||||
declines in independent diagnostic skill after just three months.
|
working independently. Clinical evidence from endoscopy studies shows
|
||||||
|
that clinicians relying on AI diagnostics saw detection rates drop
|
||||||
|
from 28.4% to 22.4% when AI was removed. A 2025 Springer paper argues
|
||||||
|
that AI deskilling is a structural problem, not merely individual
|
||||||
|
([doi:10.1007/s00146-025-02686-z](https://link.springer.com/article/10.1007/s00146-025-02686-z)).
|
||||||
|
|
||||||
This is distinct from epistemic risk (misinformation). It is about the
|
This is distinct from epistemic risk (misinformation). It is about the
|
||||||
user's cognitive capacity degrading through repeated reliance on the
|
user's cognitive capacity degrading through repeated reliance on the
|
||||||
|
|
@ -461,11 +465,13 @@ conversation should be verified independently.
|
||||||
|
|
||||||
### Linguistic homogenization
|
### Linguistic homogenization
|
||||||
|
|
||||||
LLMs are overwhelmingly trained on English (~44% of training data). A
|
LLMs are overwhelmingly trained on English (~44% of training data).
|
||||||
Stanford 2025 study found that AI tools systematically exclude
|
A Stanford 2025 study found that AI tools systematically exclude
|
||||||
non-English speakers. Each English-language conversation reinforces the
|
non-English speakers. UNESCO's 2024 report on linguistic diversity
|
||||||
economic incentive to optimize for English, marginalizing over 3,000
|
warns that AI systems risk accelerating the extinction of already-
|
||||||
already-endangered languages.
|
endangered languages by concentrating economic incentives on high-
|
||||||
|
resource languages. Each English-language conversation reinforces
|
||||||
|
this dynamic, marginalizing over 3,000 already-endangered languages.
|
||||||
|
|
||||||
## 11. Political cost
|
## 11. Political cost
|
||||||
|
|
||||||
|
|
@ -574,11 +580,12 @@ corruption of the knowledge commons.
|
||||||
## 15. Algorithmic monoculture and correlated failure
|
## 15. Algorithmic monoculture and correlated failure
|
||||||
|
|
||||||
When millions of users rely on the same few foundation models, errors
|
When millions of users rely on the same few foundation models, errors
|
||||||
become correlated rather than independent. A Stanford HAI study found that
|
become correlated rather than independent. A Stanford HAI study
|
||||||
across every model ecosystem studied, the rate of homogeneous outcomes
|
([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258)) found
|
||||||
exceeded baselines. A Nature Communications Psychology paper (2026)
|
that across every model ecosystem studied, the rate of homogeneous
|
||||||
documents that AI-driven research is producing "topical and methodological
|
outcomes exceeded baselines. A Nature Communications Psychology paper
|
||||||
convergence, flattening scientific imagination."
|
(2026) documents that AI-driven research is producing "topical and
|
||||||
|
methodological convergence, flattening scientific imagination."
|
||||||
|
|
||||||
For coding specifically: if many developers use the same model, their code
|
For coding specifically: if many developers use the same model, their code
|
||||||
will share the same blind spots, the same idiomatic patterns, and the same
|
will share the same blind spots, the same idiomatic patterns, and the same
|
||||||
|
|
@ -761,7 +768,68 @@ working on private code will fall in the "probably net-negative" to
|
||||||
honest reflection of the cost structure. Net-positive requires broad
|
honest reflection of the cost structure. Net-positive requires broad
|
||||||
reach, which requires the work to be shared.
|
reach, which requires the work to be shared.
|
||||||
|
|
||||||
## 21. What would improve this estimate
|
## 21. Related work
|
||||||
|
|
||||||
|
This methodology builds on and complements existing tools and research.
|
||||||
|
|
||||||
|
### Measurement tools (environmental)
|
||||||
|
|
||||||
|
- **[EcoLogits](https://ecologits.ai/)** — Python library from GenAI
|
||||||
|
Impact that tracks per-query energy and CO2 for API calls. Covers
|
||||||
|
operational and embodied emissions. More precise than this methodology
|
||||||
|
for environmental metrics, but does not cover social, epistemic, or
|
||||||
|
political costs.
|
||||||
|
- **[CodeCarbon](https://codecarbon.io/)** — Python library that measures
|
||||||
|
GPU/CPU/RAM electricity consumption in real time with regional carbon
|
||||||
|
intensity. Primarily for local training workloads. A 2025 validation
|
||||||
|
study found estimates can be off by ~2.4x vs. external measurements.
|
||||||
|
- **[Hugging Face AI Energy Score](https://huggingface.github.io/AIEnergyScore/)** —
|
||||||
|
Standardized energy efficiency benchmarking across AI models. Useful
|
||||||
|
for model selection but does not provide per-conversation accounting.
|
||||||
|
- **[Green Algorithms](https://www.green-algorithms.org/)** — Web
|
||||||
|
calculator from University of Cambridge for any computational workload.
|
||||||
|
Not AI-specific.
|
||||||
|
|
||||||
|
### Published per-query data
|
||||||
|
|
||||||
|
- **Patterson et al. (Google, August 2025)**: Most rigorous provider-
|
||||||
|
published per-query data. Reports 0.24 Wh, 0.03g CO2, and 0.26 mL
|
||||||
|
water per median Gemini text prompt. Showed 33x energy reduction over
|
||||||
|
one year. ([arXiv:2508.15734](https://arxiv.org/abs/2508.15734))
|
||||||
|
- **Jegham et al. ("How Hungry is AI?", May 2025)**: Cross-model
|
||||||
|
benchmarks for 30 LLMs showing 70x energy variation between models.
|
||||||
|
([arXiv:2505.09598](https://arxiv.org/abs/2505.09598))
|
||||||
|
|
||||||
|
### Broader frameworks
|
||||||
|
|
||||||
|
- **UNICC/Frugal AI Hub (December 2025)**: Three-level framework from
|
||||||
|
Total Cost of Ownership to SDG alignment. Portfolio-level, not per-
|
||||||
|
conversation. Does not enumerate specific social cost categories.
|
||||||
|
- **Practical Principles for AI Cost and Compute Accounting (arXiv,
|
||||||
|
February 2025)**: Proposes compute as a governance metric. Financial
|
||||||
|
and compute only.
|
||||||
|
|
||||||
|
### Research on social costs
|
||||||
|
|
||||||
|
- **Lee et al. (CHI 2025)**: "The AI Deskilling Paradox" — survey
|
||||||
|
finding that higher AI confidence correlates with less critical
|
||||||
|
thinking. See Section 10.
|
||||||
|
- **Springer (2025)**: Argues deskilling is structural, not individual.
|
||||||
|
- **Shumailov et al. (Nature, 2024)**: Model collapse from recursive
|
||||||
|
AI-generated training data. See Section 13.
|
||||||
|
- **Stanford HAI (2025)**: Algorithmic monoculture and correlated failure
|
||||||
|
across model ecosystems. See Section 15.
|
||||||
|
|
||||||
|
### How this methodology differs
|
||||||
|
|
||||||
|
No existing tool or framework combines per-conversation environmental
|
||||||
|
measurement with social, cognitive, epistemic, and political cost
|
||||||
|
categories. The tools above measure environmental costs well — we do
|
||||||
|
not compete with them. Our contribution is the taxonomy: naming and
|
||||||
|
organizing 20+ cost categories so that the non-environmental costs are
|
||||||
|
not ignored simply because they are harder to quantify.
|
||||||
|
|
||||||
|
## 22. What would improve this estimate
|
||||||
|
|
||||||
- Access to actual energy-per-token and training energy metrics from
|
- Access to actual energy-per-token and training energy metrics from
|
||||||
model providers
|
model providers
|
||||||
|
|
|
||||||
|
|
@ -40,7 +40,8 @@ The hook fires before Claude Code compacts your conversation context.
|
||||||
It reads the conversation transcript, extracts token usage data from
|
It reads the conversation transcript, extracts token usage data from
|
||||||
API response metadata, and calculates cost estimates using:
|
API response metadata, and calculates cost estimates using:
|
||||||
|
|
||||||
- **Energy**: 0.003 Wh/1K input tokens, 0.015 Wh/1K output tokens
|
- **Energy**: 0.1 Wh/1K input tokens, 0.5 Wh/1K output tokens
|
||||||
|
(midpoint of range calibrated against Google and Jegham et al., 2025)
|
||||||
- **PUE**: 1.2 (data center overhead)
|
- **PUE**: 1.2 (data center overhead)
|
||||||
- **CO2**: 325g/kWh (US grid average for cloud regions)
|
- **CO2**: 325g/kWh (US grid average for cloud regions)
|
||||||
- **Cost**: $15/M input tokens, $75/M output tokens
|
- **Cost**: $15/M input tokens, $75/M output tokens
|
||||||
|
|
@ -48,13 +49,32 @@ API response metadata, and calculates cost estimates using:
|
||||||
Cache-read tokens are weighted at 10% of full cost (they skip most
|
Cache-read tokens are weighted at 10% of full cost (they skip most
|
||||||
computation).
|
computation).
|
||||||
|
|
||||||
|
## Related tools
|
||||||
|
|
||||||
|
This toolkit measures a subset of the costs covered by
|
||||||
|
`impact-methodology.md`. For more precise environmental measurement,
|
||||||
|
consider these complementary tools:
|
||||||
|
|
||||||
|
- **[EcoLogits](https://ecologits.ai/)** — Python library that tracks
|
||||||
|
per-query energy and CO2 for API calls to OpenAI, Anthropic, Mistral,
|
||||||
|
and others. More precise than our estimates for environmental metrics.
|
||||||
|
- **[CodeCarbon](https://codecarbon.io/)** — Measures GPU/CPU energy for
|
||||||
|
local training and inference workloads.
|
||||||
|
- **[Hugging Face AI Energy Score](https://huggingface.github.io/AIEnergyScore/)** —
|
||||||
|
Benchmarks model energy efficiency. Useful for choosing between models.
|
||||||
|
|
||||||
|
These tools focus on environmental metrics only. This toolkit and the
|
||||||
|
methodology also cover financial, social, epistemic, and political costs.
|
||||||
|
|
||||||
## Limitations
|
## Limitations
|
||||||
|
|
||||||
- All numbers are estimates with low to medium confidence.
|
- All numbers are estimates with low to medium confidence.
|
||||||
- Energy-per-token figures are derived from published research on
|
- Energy-per-token figures are calibrated against published research
|
||||||
comparable models, not official Anthropic data.
|
(Google, Aug 2025; Jegham et al., May 2025), not official Anthropic data.
|
||||||
- The hook only runs on context compaction, not at conversation end.
|
- The hook only runs on context compaction, not at conversation end.
|
||||||
Short conversations that never compact will not be logged.
|
Short conversations that never compact will not be logged.
|
||||||
|
- This toolkit only works with Claude Code. The methodology itself is
|
||||||
|
tool-agnostic.
|
||||||
- See `impact-methodology.md` for the full methodology, uncertainty
|
- See `impact-methodology.md` for the full methodology, uncertainty
|
||||||
analysis, and non-quantifiable costs.
|
analysis, and non-quantifiable costs.
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue