Tasks 12-14: Related work, citations, complementary tool links

Task 12: Add Related Work section (Section 21) to methodology covering EcoLogits, CodeCarbon, AI Energy Score, Green Algorithms, Google/Jegham published data, UNICC framework, and social cost research. Task 13: Add specific citations and links for cognitive deskilling (CHI 2025, Springer 2025, endoscopy study), linguistic homogenization (UNESCO), and algorithmic monoculture (Stanford HAI). Task 14: Add Related Tools section to toolkit README linking EcoLogits, CodeCarbon, and AI Energy Score. Also updated toolkit energy values to match calibrated methodology.
2026-03-16 10:43:51 +00:00 · 2026-03-16 10:43:51 +00:00 · c619c31caf
commit c619c31caf
parent 9653f69860
2 changed files with 108 additions and 20 deletions
--- a/impact-methodology.md
+++ b/impact-methodology.md
@ -441,12 +441,16 @@ livelihoods) depends on the economic context and is genuinely uncertain.
 ### Cognitive deskilling
-A Microsoft/CHI 2025 study found that higher confidence in GenAI
+A Microsoft/CMU study (Lee et al., CHI 2025) found that higher
-correlates with less critical thinking effort. An MIT Media Lab study
+confidence in GenAI correlates with less critical thinking effort
-("Your Brain on ChatGPT") documented "cognitive debt" — users who relied
+([ACM DL](https://dl.acm.org/doi/full/10.1145/3706598.3713778)). An
-on AI for tasks performed worse when later working independently. Clinical
+MIT Media Lab study ("Your Brain on ChatGPT") documented "cognitive
-evidence shows that clinicians relying on AI diagnostics saw measurable
+debt" — users who relied on AI for tasks performed worse when later
-declines in independent diagnostic skill after just three months.
+working independently. Clinical evidence from endoscopy studies shows
 that clinicians relying on AI diagnostics saw detection rates drop
 from 28.4% to 22.4% when AI was removed. A 2025 Springer paper argues
 that AI deskilling is a structural problem, not merely individual
 ([doi:10.1007/s00146-025-02686-z](https://link.springer.com/article/10.1007/s00146-025-02686-z)).
 This is distinct from epistemic risk (misinformation). It is about the
 user's cognitive capacity degrading through repeated reliance on the
@ -461,11 +465,13 @@ conversation should be verified independently.
 ### Linguistic homogenization
-LLMs are overwhelmingly trained on English (~44% of training data). A
+LLMs are overwhelmingly trained on English (~44% of training data).
-Stanford 2025 study found that AI tools systematically exclude
+A Stanford 2025 study found that AI tools systematically exclude
-non-English speakers. Each English-language conversation reinforces the
+non-English speakers. UNESCO's 2024 report on linguistic diversity
-economic incentive to optimize for English, marginalizing over 3,000
+warns that AI systems risk accelerating the extinction of already-
-already-endangered languages.
+endangered languages by concentrating economic incentives on high-
 resource languages. Each English-language conversation reinforces
 this dynamic, marginalizing over 3,000 already-endangered languages.
 ## 11. Political cost
@ -574,11 +580,12 @@ corruption of the knowledge commons.
 ## 15. Algorithmic monoculture and correlated failure
 When millions of users rely on the same few foundation models, errors
-become correlated rather than independent. A Stanford HAI study found that
+become correlated rather than independent. A Stanford HAI study
-across every model ecosystem studied, the rate of homogeneous outcomes
+([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258)) found
-exceeded baselines. A Nature Communications Psychology paper (2026)
+that across every model ecosystem studied, the rate of homogeneous
-documents that AI-driven research is producing "topical and methodological
+outcomes exceeded baselines. A Nature Communications Psychology paper
-convergence, flattening scientific imagination."
+(2026) documents that AI-driven research is producing "topical and
 methodological convergence, flattening scientific imagination."
 For coding specifically: if many developers use the same model, their code
 will share the same blind spots, the same idiomatic patterns, and the same
@ -761,7 +768,68 @@ working on private code will fall in the "probably net-negative" to
 honest reflection of the cost structure. Net-positive requires broad
 reach, which requires the work to be shared.
-## 21. What would improve this estimate
+## 21. Related work
 This methodology builds on and complements existing tools and research.
 ### Measurement tools (environmental)
 - **[EcoLogits](https://ecologits.ai/)** — Python library from GenAI
  Impact that tracks per-query energy and CO2 for API calls. Covers
  operational and embodied emissions. More precise than this methodology
  for environmental metrics, but does not cover social, epistemic, or
  political costs.
 - **[CodeCarbon](https://codecarbon.io/)** — Python library that measures
  GPU/CPU/RAM electricity consumption in real time with regional carbon
  intensity. Primarily for local training workloads. A 2025 validation
  study found estimates can be off by ~2.4x vs. external measurements.
 - **[Hugging Face AI Energy Score](https://huggingface.github.io/AIEnergyScore/)** —
  Standardized energy efficiency benchmarking across AI models. Useful
  for model selection but does not provide per-conversation accounting.
 - **[Green Algorithms](https://www.green-algorithms.org/)** — Web
  calculator from University of Cambridge for any computational workload.
  Not AI-specific.
 ### Published per-query data
 - **Patterson et al. (Google, August 2025)**: Most rigorous provider-
  published per-query data. Reports 0.24 Wh, 0.03g CO2, and 0.26 mL
  water per median Gemini text prompt. Showed 33x energy reduction over
  one year. ([arXiv:2508.15734](https://arxiv.org/abs/2508.15734))
 - **Jegham et al. ("How Hungry is AI?", May 2025)**: Cross-model
  benchmarks for 30 LLMs showing 70x energy variation between models.
  ([arXiv:2505.09598](https://arxiv.org/abs/2505.09598))
 ### Broader frameworks
 - **UNICC/Frugal AI Hub (December 2025)**: Three-level framework from
  Total Cost of Ownership to SDG alignment. Portfolio-level, not per-
  conversation. Does not enumerate specific social cost categories.
 - **Practical Principles for AI Cost and Compute Accounting (arXiv,
  February 2025)**: Proposes compute as a governance metric. Financial
  and compute only.
 ### Research on social costs
 - **Lee et al. (CHI 2025)**: "The AI Deskilling Paradox" — survey
  finding that higher AI confidence correlates with less critical
  thinking. See Section 10.
 - **Springer (2025)**: Argues deskilling is structural, not individual.
 - **Shumailov et al. (Nature, 2024)**: Model collapse from recursive
  AI-generated training data. See Section 13.
 - **Stanford HAI (2025)**: Algorithmic monoculture and correlated failure
  across model ecosystems. See Section 15.
 ### How this methodology differs
 No existing tool or framework combines per-conversation environmental
 measurement with social, cognitive, epistemic, and political cost
 categories. The tools above measure environmental costs well — we do
 not compete with them. Our contribution is the taxonomy: naming and
 organizing 20+ cost categories so that the non-environmental costs are
 not ignored simply because they are harder to quantify.
 ## 22. What would improve this estimate
 - Access to actual energy-per-token and training energy metrics from
  model providers
--- a/impact-toolkit/README.md
+++ b/impact-toolkit/README.md
@ -40,7 +40,8 @@ The hook fires before Claude Code compacts your conversation context.
 It reads the conversation transcript, extracts token usage data from
 API response metadata, and calculates cost estimates using:
- **Energy**: 0.003 Wh/1K input tokens, 0.015 Wh/1K output tokens
+- **Energy**: 0.1 Wh/1K input tokens, 0.5 Wh/1K output tokens
  (midpoint of range calibrated against Google and Jegham et al., 2025)
 - **PUE**: 1.2 (data center overhead)
 - **CO2**: 325g/kWh (US grid average for cloud regions)
 - **Cost**: $15/M input tokens, $75/M output tokens
@ -48,13 +49,32 @@ API response metadata, and calculates cost estimates using:
 Cache-read tokens are weighted at 10% of full cost (they skip most
 computation).
 ## Related tools
 This toolkit measures a subset of the costs covered by
 `impact-methodology.md`. For more precise environmental measurement,
 consider these complementary tools:
 - **[EcoLogits](https://ecologits.ai/)** — Python library that tracks
  per-query energy and CO2 for API calls to OpenAI, Anthropic, Mistral,
  and others. More precise than our estimates for environmental metrics.
 - **[CodeCarbon](https://codecarbon.io/)** — Measures GPU/CPU energy for
  local training and inference workloads.
 - **[Hugging Face AI Energy Score](https://huggingface.github.io/AIEnergyScore/)** —
  Benchmarks model energy efficiency. Useful for choosing between models.
 These tools focus on environmental metrics only. This toolkit and the
 methodology also cover financial, social, epistemic, and political costs.
 ## Limitations
 - All numbers are estimates with low to medium confidence.
- Energy-per-token figures are derived from published research on
+- Energy-per-token figures are calibrated against published research
-  comparable models, not official Anthropic data.
+  (Google, Aug 2025; Jegham et al., May 2025), not official Anthropic data.
 - The hook only runs on context compaction, not at conversation end.
  Short conversations that never compact will not be logged.
 - This toolkit only works with Claude Code. The methodology itself is
  tool-agnostic.
 - See `impact-methodology.md` for the full methodology, uncertainty
  analysis, and non-quantifiable costs.