Tasks 12-14: Related work, citations, complementary tool links

Task 12: Add Related Work section (Section 21) to methodology covering EcoLogits, CodeCarbon, AI Energy Score, Green Algorithms, Google/Jegham published data, UNICC framework, and social cost research. Task 13: Add specific citations and links for cognitive deskilling (CHI 2025, Springer 2025, endoscopy study), linguistic homogenization (UNESCO), and algorithmic monoculture (Stanford HAI). Task 14: Add Related Tools section to toolkit README linking EcoLogits, CodeCarbon, and AI Energy Score. Also updated toolkit energy values to match calibrated methodology.
2026-03-16 10:43:51 +00:00 · 2026-03-16 10:43:51 +00:00 · c619c31caf
commit c619c31caf
parent 9653f69860
2 changed files with 108 additions and 20 deletions
--- a/impact-methodology.md
+++ b/impact-methodology.md
@ -441,12 +441,16 @@ livelihoods) depends on the economic context and is genuinely uncertain.

 ### Cognitive deskilling

-A Microsoft/CHI 2025 study found that higher confidence in GenAI
-correlates with less critical thinking effort. An MIT Media Lab study
-("Your Brain on ChatGPT") documented "cognitive debt" — users who relied
-on AI for tasks performed worse when later working independently. Clinical
-evidence shows that clinicians relying on AI diagnostics saw measurable
-declines in independent diagnostic skill after just three months.
+A Microsoft/CMU study (Lee et al., CHI 2025) found that higher
+confidence in GenAI correlates with less critical thinking effort
+([ACM DL](https://dl.acm.org/doi/full/10.1145/3706598.3713778)). An
+MIT Media Lab study ("Your Brain on ChatGPT") documented "cognitive
+debt" — users who relied on AI for tasks performed worse when later
+working independently. Clinical evidence from endoscopy studies shows
+that clinicians relying on AI diagnostics saw detection rates drop
+from 28.4% to 22.4% when AI was removed. A 2025 Springer paper argues
+that AI deskilling is a structural problem, not merely individual
+([doi:10.1007/s00146-025-02686-z](https://link.springer.com/article/10.1007/s00146-025-02686-z)).

 This is distinct from epistemic risk (misinformation). It is about the
 user's cognitive capacity degrading through repeated reliance on the
@ -461,11 +465,13 @@ conversation should be verified independently.

 ### Linguistic homogenization

-LLMs are overwhelmingly trained on English (~44% of training data). A
-Stanford 2025 study found that AI tools systematically exclude
-non-English speakers. Each English-language conversation reinforces the
-economic incentive to optimize for English, marginalizing over 3,000
-already-endangered languages.
+LLMs are overwhelmingly trained on English (~44% of training data).
+A Stanford 2025 study found that AI tools systematically exclude
+non-English speakers. UNESCO's 2024 report on linguistic diversity
+warns that AI systems risk accelerating the extinction of already-
+endangered languages by concentrating economic incentives on high-
+resource languages. Each English-language conversation reinforces
+this dynamic, marginalizing over 3,000 already-endangered languages.

 ## 11. Political cost

@ -574,11 +580,12 @@ corruption of the knowledge commons.
 ## 15. Algorithmic monoculture and correlated failure

 When millions of users rely on the same few foundation models, errors
-become correlated rather than independent. A Stanford HAI study found that
-across every model ecosystem studied, the rate of homogeneous outcomes
-exceeded baselines. A Nature Communications Psychology paper (2026)
-documents that AI-driven research is producing "topical and methodological
-convergence, flattening scientific imagination."
+become correlated rather than independent. A Stanford HAI study
+([Bommasani et al., 2022](https://arxiv.org/abs/2108.07258)) found
+that across every model ecosystem studied, the rate of homogeneous
+outcomes exceeded baselines. A Nature Communications Psychology paper
+(2026) documents that AI-driven research is producing "topical and
+methodological convergence, flattening scientific imagination."

 For coding specifically: if many developers use the same model, their code
 will share the same blind spots, the same idiomatic patterns, and the same
@ -761,7 +768,68 @@ working on private code will fall in the "probably net-negative" to
 honest reflection of the cost structure. Net-positive requires broad
 reach, which requires the work to be shared.

-## 21. What would improve this estimate
+## 21. Related work
+
+This methodology builds on and complements existing tools and research.
+
+### Measurement tools (environmental)
+
+- **[EcoLogits](https://ecologits.ai/)** — Python library from GenAI
+  Impact that tracks per-query energy and CO2 for API calls. Covers
+  operational and embodied emissions. More precise than this methodology
+  for environmental metrics, but does not cover social, epistemic, or
+  political costs.
+- **[CodeCarbon](https://codecarbon.io/)** — Python library that measures
+  GPU/CPU/RAM electricity consumption in real time with regional carbon
+  intensity. Primarily for local training workloads. A 2025 validation
+  study found estimates can be off by ~2.4x vs. external measurements.
+- **[Hugging Face AI Energy Score](https://huggingface.github.io/AIEnergyScore/)** —
+  Standardized energy efficiency benchmarking across AI models. Useful
+  for model selection but does not provide per-conversation accounting.
+- **[Green Algorithms](https://www.green-algorithms.org/)** — Web
+  calculator from University of Cambridge for any computational workload.
+  Not AI-specific.
+
+### Published per-query data
+
+- **Patterson et al. (Google, August 2025)**: Most rigorous provider-
+  published per-query data. Reports 0.24 Wh, 0.03g CO2, and 0.26 mL
+  water per median Gemini text prompt. Showed 33x energy reduction over
+  one year. ([arXiv:2508.15734](https://arxiv.org/abs/2508.15734))
+- **Jegham et al. ("How Hungry is AI?", May 2025)**: Cross-model
+  benchmarks for 30 LLMs showing 70x energy variation between models.
+  ([arXiv:2505.09598](https://arxiv.org/abs/2505.09598))
+
+### Broader frameworks
+
+- **UNICC/Frugal AI Hub (December 2025)**: Three-level framework from
+  Total Cost of Ownership to SDG alignment. Portfolio-level, not per-
+  conversation. Does not enumerate specific social cost categories.
+- **Practical Principles for AI Cost and Compute Accounting (arXiv,
+  February 2025)**: Proposes compute as a governance metric. Financial
+  and compute only.
+
+### Research on social costs
+
+- **Lee et al. (CHI 2025)**: "The AI Deskilling Paradox" — survey
+  finding that higher AI confidence correlates with less critical
+  thinking. See Section 10.
+- **Springer (2025)**: Argues deskilling is structural, not individual.
+- **Shumailov et al. (Nature, 2024)**: Model collapse from recursive
+  AI-generated training data. See Section 13.
+- **Stanford HAI (2025)**: Algorithmic monoculture and correlated failure
+  across model ecosystems. See Section 15.
+
+### How this methodology differs
+
+No existing tool or framework combines per-conversation environmental
+measurement with social, cognitive, epistemic, and political cost
+categories. The tools above measure environmental costs well — we do
+not compete with them. Our contribution is the taxonomy: naming and
+organizing 20+ cost categories so that the non-environmental costs are
+not ignored simply because they are harder to quantify.
+
+## 22. What would improve this estimate

 - Access to actual energy-per-token and training energy metrics from
  model providers
--- a/impact-toolkit/README.md
+++ b/impact-toolkit/README.md
@ -40,7 +40,8 @@ The hook fires before Claude Code compacts your conversation context.
 It reads the conversation transcript, extracts token usage data from
 API response metadata, and calculates cost estimates using:

- **Energy**: 0.003 Wh/1K input tokens, 0.015 Wh/1K output tokens
+- **Energy**: 0.1 Wh/1K input tokens, 0.5 Wh/1K output tokens
+  (midpoint of range calibrated against Google and Jegham et al., 2025)
 - **PUE**: 1.2 (data center overhead)
 - **CO2**: 325g/kWh (US grid average for cloud regions)
 - **Cost**: $15/M input tokens, $75/M output tokens
@ -48,13 +49,32 @@ API response metadata, and calculates cost estimates using:
 Cache-read tokens are weighted at 10% of full cost (they skip most
 computation).

+## Related tools
+
+This toolkit measures a subset of the costs covered by
+`impact-methodology.md`. For more precise environmental measurement,
+consider these complementary tools:
+
+- **[EcoLogits](https://ecologits.ai/)** — Python library that tracks
+  per-query energy and CO2 for API calls to OpenAI, Anthropic, Mistral,
+  and others. More precise than our estimates for environmental metrics.
+- **[CodeCarbon](https://codecarbon.io/)** — Measures GPU/CPU energy for
+  local training and inference workloads.
+- **[Hugging Face AI Energy Score](https://huggingface.github.io/AIEnergyScore/)** —
+  Benchmarks model energy efficiency. Useful for choosing between models.
+
+These tools focus on environmental metrics only. This toolkit and the
+methodology also cover financial, social, epistemic, and political costs.
+
 ## Limitations

 - All numbers are estimates with low to medium confidence.
- Energy-per-token figures are derived from published research on
-  comparable models, not official Anthropic data.
+- Energy-per-token figures are calibrated against published research
+  (Google, Aug 2025; Jegham et al., May 2025), not official Anthropic data.
 - The hook only runs on context compaction, not at conversation end.
  Short conversations that never compact will not be logged.
+- This toolkit only works with Claude Code. The methodology itself is
+  tool-agnostic.
 - See `impact-methodology.md` for the full methodology, uncertainty
  analysis, and non-quantifiable costs.