Initial commit: AI conversation impact methodology and toolkit

CC0-licensed methodology for estimating the environmental and social costs of AI conversations (20+ categories), plus a reusable toolkit for automated impact tracking in Claude Code sessions.
2026-03-16 09:46:49 +00:00 · 2026-03-16 09:46:49 +00:00 · 0543a43816
commit 0543a43816
27 changed files with 2439 additions and 0 deletions
--- a/impact-methodology.md
+++ b/impact-methodology.md
@ -0,0 +1,748 @@
+# Methodology for Estimating the Impact of an LLM Conversation
+
+## Introduction
+
+This document provides a framework for estimating the total cost —
+environmental, financial, social, and political — of a conversation with
+a large language model (LLM) running on cloud infrastructure.
+
+**Who this is for:** Anyone who wants to understand what a conversation
+with an AI assistant actually costs, beyond the subscription price. This
+includes developers using coding agents, researchers studying AI
+sustainability, and anyone making decisions about when AI tools are worth
+their cost.
+
+**How to use it:** The framework identifies 20+ cost categories, provides
+estimation methods for the quantifiable ones, and names the
+unquantifiable ones so they are not ignored. You can apply it to your own
+conversations by substituting your own token counts and parameters.
+
+**Limitations:** Most estimates have low confidence. Many of the most
+consequential costs cannot be quantified at all. This is a tool for
+honest approximation, not precise accounting. See the confidence summary
+(Section 19) for details.
+
+## What we are measuring
+
+The total cost of a single LLM conversation. Restricting the analysis to
+CO2 alone would miss most of the picture.
+
+### Cost categories
+
+**Environmental:**
+1. Inference energy (GPU computation for the conversation)
+2. Training energy (amortized share of the cost of training the model)
+3. Data center overhead (cooling, networking, storage)
+4. Client-side energy (the user's local machine)
+5. Embodied carbon and materials (hardware manufacturing, mining)
+6. E-waste (toxic hardware disposal, distinct from embodied carbon)
+7. Grid displacement (AI demand consuming renewable capacity)
+8. Data center community impacts (noise, land, local resource strain)
+
+**Financial and economic:**
+9. Direct compute cost and opportunity cost
+10. Creative market displacement (per-conversation, not just training)
+
+**Social and cognitive:**
+11. Annotation labor conditions
+12. Cognitive deskilling of the user
+13. Mental health effects (dependency, loneliness paradox)
+14. Linguistic homogenization and language endangerment
+
+**Epistemic and systemic:**
+15. AI-generated code quality degradation and technical debt
+16. Model collapse / internet data pollution
+17. Scientific research integrity contamination
+18. Algorithmic monoculture and correlated failure risk
+
+**Political:**
+19. Concentration of power, geopolitical implications, data sovereignty
+
+**Meta-methodological:**
+20. Jevons paradox (efficiency gains driving increased total usage)
+
+## 1. Token estimation
+
+### Why tokens matter
+
+LLM inference cost scales with the number of tokens processed. Each time
+the model produces a response, it reprocesses the entire conversation
+history (input tokens) and generates new text (output tokens). Output
+tokens are more expensive per token because they are generated
+sequentially, each requiring a full forward pass, whereas input tokens
+can be processed in parallel.
+
+### How to estimate
+
+If you have access to API response headers or usage metadata, use the
+actual token counts. Otherwise, estimate:
+
+- **Bytes to tokens:** English text and JSON average ~4 bytes per token
+  (range: 3.5-4.5 depending on content type). Code tends toward the
+  higher end.
+- **Cumulative input tokens:** Each assistant turn reprocesses the full
+  context. For a conversation with N turns and final context size T, the
+  cumulative input tokens are approximately T/2 * N (the average context
+  size times the number of turns).
+- **Output tokens:** Typically 1-5% of the total transcript size,
+  depending on how verbose the assistant is.
+
+### Example
+
+A 20-turn conversation with a 200K-token final context:
+- Cumulative input: ~100K * 20 = ~2,000,000 tokens
+- Output: ~10,000 tokens
+
+### Uncertainty
+
+Token estimates from byte counts can be off by a factor of 2. Key
+unknowns:
+- The model's exact tokenization (tokens per byte ratio varies by content)
+- Whether context caching reduces reprocessing
+- The exact number of internal inference calls (tool sequences may involve
+  multiple calls)
+- Whether the system compresses prior messages near context limits
+
+## 2. Energy per token
+
+### Sources
+
+There is no published energy-per-token figure for most commercial LLMs.
+Estimates are derived from:
+
+- Luccioni, Viguier & Ligozat (2023), "Estimating the Carbon Footprint
+  of BLOOM", which measured energy for a 176B parameter model.
+- The IEA's 2024 estimate of ~2.9 Wh per ChatGPT query (for GPT-4-class
+  models, averaging ~1,000 tokens per query).
+- De Vries (2023), "The growing energy footprint of artificial
+  intelligence", Joule.
+
+### Values used
+
+- **Input tokens**: ~0.003 Wh per 1,000 tokens
+- **Output tokens**: ~0.015 Wh per 1,000 tokens (5x input cost,
+  reflecting sequential generation)
+
+### Uncertainty
+
+These numbers are rough. The actual values depend on:
+- Model size (parameter counts for commercial models are often not public)
+- Hardware (GPU type, batch size, utilization)
+- Quantization and optimization techniques
+- Whether speculative decoding or KV-cache optimizations are used
+
+The true values could be 0.5x to 3x the figures used here.
+
+## 3. Data center overhead (PUE)
+
+Power Usage Effectiveness (PUE) measures total data center energy divided
+by IT equipment energy. It accounts for cooling, lighting, networking, and
+other infrastructure.
+
+- **Value used**: PUE = 1.2
+- **Source**: Google reports PUE of 1.10 for its best data centers;
+  industry average is ~1.3 (Uptime Institute, 2023). 1.2 is a reasonable
+  estimate for a major cloud provider.
+
+This is relatively well-established and unlikely to be off by more than
+15%.
+
+## 4. Client-side energy
+
+The user's machine contributes a small amount of energy during the
+conversation. For a typical desktop or laptop:
+
+- Idle power: ~30-60W (desktop) or ~10-20W (laptop)
+- Marginal power for active use: ~5-20W above idle
+- Duration: varies by conversation length
+
+For a 30-minute conversation on a desktop, estimate ~0.5-1 Wh. This is
+typically a small fraction of the total and adequate precision is easy to
+achieve.
+
+## 5. CO2 conversion
+
+### Grid carbon intensity
+
+CO2 per kWh depends on the electricity source:
+
+- **US grid average**: ~400g CO2/kWh (EPA eGRID)
+- **Major cloud data center regions**: ~300-400g CO2/kWh
+- **France** (nuclear-dominated): ~56g CO2/kWh
+- **Norway/Iceland** (hydro-dominated): ~20-30g CO2/kWh
+- **Poland/Australia** (coal-heavy): ~600-800g CO2/kWh
+
+Use physical grid intensity for the data center's region, not accounting
+for renewable energy credits or offsets. The physical electrons consumed
+come from the regional grid in real time.
+
+### Calculation template
+
+```
+Server energy = (cumulative_input_tokens * 0.003/1000
+                 + output_tokens * 0.015/1000) * PUE
+
+Server CO2    = server_energy_Wh * grid_intensity_g_per_kWh / 1000
+
+Client CO2    = client_energy_Wh * local_grid_intensity / 1000
+
+Total CO2     = Server CO2 + Client CO2
+```
+
+### Example
+
+A conversation with 2M cumulative input tokens and 10K output tokens:
+```
+Server energy = (2,000,000 * 0.003/1000 + 10,000 * 0.015/1000) * 1.2
+              = (6.0 + 0.15) * 1.2
+              = ~7.4 Wh
+
+Server CO2    = 7.4 * 350 / 1000 = ~2.6g CO2
+
+Client CO2    = 0.5 * 56 / 1000  = ~0.03g CO2  (France)
+
+Total CO2     = ~2.6g
+```
+
+## 6. Water usage
+
+Data centers use water for evaporative cooling. Li et al. (2023), "Making
+AI Less Thirsty", estimated that GPT-3 inference consumes ~0.5 mL of
+water per 10-50 tokens of output. Scaling for model size and output
+volume:
+
+**Rough estimate: 0.05-0.5 liters per long conversation.**
+
+This depends heavily on the data center's cooling technology (some use
+closed-loop systems with near-zero water consumption) and the local
+climate.
+
+## 7. Training cost (amortized)
+
+### Why it cannot be dismissed
+
+Training is not a sunk cost. It is an investment made in anticipation of
+demand. Each conversation is part of the demand that justifies training
+the current model and funding the next one. The marginal cost framing
+hides the system-level cost.
+
+### Scale of training
+
+Published and estimated figures for frontier model training:
+
+- GPT-3 (175B params, 2020): ~1,287 MWh (Patterson et al., 2021)
+- GPT-4 (2023): estimated ~50,000-100,000 MWh (unconfirmed)
+- Frontier models in 2025-2026: likely 10,000-200,000 MWh range
+
+At 350g CO2/kWh, a 50,000 MWh training run produces ~17,500 tonnes of
+CO2.
+
+### Amortization
+
+If the model serves N total conversations over its lifetime, each
+conversation's share is (training cost / N). Rough reasoning:
+
+- If a major model serves ~10 million conversations per day for ~1 year:
+  N ~ 3.6 billion conversations.
+- Per-conversation share: 50,000,000 Wh / 3,600,000,000 ~ 0.014 Wh,
+  or ~0.005g CO2.
+
+This is small per conversation — but only because the denominator is
+enormous. The total remains vast. Two framings:
+
+- **Marginal**: My share is ~0.005g CO2. Negligible.
+- **Attributional**: I am one of billions of participants in a system
+  that emits ~17,500 tonnes. My participation sustains the system.
+
+Neither framing is wrong. They answer different questions.
+
+### RLHF and fine-tuning
+
+Training also includes reinforcement learning from human feedback (RLHF).
+This has its own energy cost (additional training runs) and, more
+importantly, a human labor cost (see Section 9).
+
+## 8. Embodied carbon and materials
+
+Manufacturing GPUs requires:
+- **Rare earth mining** (neodymium, tantalum, cobalt, lithium) — with
+  associated environmental destruction, water pollution, and often
+  exploitative labor conditions in the DRC, Chile, China.
+- **Semiconductor fabrication** — extremely energy- and water-intensive
+  (TSMC reports ~15,000 tonnes CO2 per fab per year).
+- **Server assembly, shipping, data center construction.**
+
+Per-conversation share is tiny (same large-N amortization), but the
+aggregate is significant and the harms (mining pollution, habitat
+destruction) are not captured by CO2 metrics alone.
+
+**Not estimated numerically** — the data to do this properly is not
+public.
+
+### Critical minerals: human rights dimension
+
+The embodied carbon framing understates the harm. GPU production depends
+on gallium (98% sourced from China), germanium, cobalt (DRC), lithium,
+tantalum, and palladium. Artisanal cobalt miners in the DRC work without
+safety equipment, exposed to dust causing "hard metal lung disease."
+Communities face land displacement and environmental contamination. A
+2025 Science paper argues that "global majority countries must embed
+critical minerals into AI governance" (doi:10.1126/science.aef6678). The
+per-conversation share of this suffering is unquantifiable but
+structurally real.
+
+## 8b. E-waste
+
+Distinct from embodied carbon. AI-specific GPUs become obsolete in 2-3
+years (vs. 5-7 for general servers). Projections: 2.5 million tonnes of
+AI-related e-waste per year by 2030 (IEEE Spectrum). E-waste contains
+lead, mercury, cadmium, and brominated flame retardants that leach into
+soil and water. Recycling yields are negligible due to component
+miniaturization. Much of it is processed by workers in developing
+countries with minimal protection.
+
+This is not captured by CO2 or embodied-carbon accounting. It is a
+distinct toxic-waste externality.
+
+## 8c. Grid displacement and renewable cannibalization
+
+The energy estimates above use average grid carbon intensity. But the
+*marginal* impact of additional AI demand may be worse than average. U.S.
+data center demand is projected to reach 325-580 TWh by 2028 (IEA),
+6.7-12.0% of total U.S. electricity. When AI data centers claim renewable
+energy via Power Purchase Agreements, the "additionality" question is
+critical: is this new generation, or is it diverting existing renewables
+from other consumers? In several regions, AI demand is outpacing grid
+capacity, and companies are installing natural gas peakers to fill gaps.
+
+The correct carbon intensity for a conversation's marginal electricity
+may therefore be higher than the grid average.
+
+## 8d. Data center community impacts
+
+Data centers impose localized costs that global metrics miss:
+- **Noise**: Cooling systems run 24/7 at 55-85 dBA (safe threshold:
+  70 dBA). Communities near data centers report sleep disruption and
+  stress.
+- **Water**: Evaporative cooling competes with municipal water supply,
+  particularly in arid regions.
+- **Land**: Data center campuses displace other land uses and require
+  high-voltage transmission lines through residential areas.
+- **Jobs**: Data centers create very few long-term jobs relative to
+  their footprint and resource consumption.
+
+Virginia alone has plans for 70+ new data centers (NPR, 2025). Residents
+are increasingly organizing against expansions. The per-conversation
+share of these harms is infinitesimal, but each conversation is part of
+the demand that justifies new construction.
+
+## 9. Financial cost
+
+### Direct cost
+
+API pricing for frontier models (as of early 2025): ~$15 per million
+input tokens, ~$75 per million output tokens (for the most capable
+models). Smaller models are cheaper.
+
+Example for a conversation with 2M cumulative input tokens and 10K
+output tokens:
+
+```
+Input:  2,000,000 tokens * $15/1M  = $30.00
+Output:    10,000 tokens * $75/1M  = $ 0.75
+Total: ~$31
+```
+
+Longer conversations cost more because cumulative input tokens grow
+superlinearly. A very long session (250K+ context, 250+ turns) can
+easily reach $500-1000.
+
+Subscription pricing (e.g., Claude Code) may differ, but the underlying
+compute cost is similar.
+
+### What that money could do instead
+
+To make the opportunity cost concrete:
+- ~$30 buys ~30 malaria bed nets via the Against Malaria Foundation
+- ~$30 buys ~150 meals at a food bank (~$0.20/meal in bulk)
+- ~$30 pays ~15-23 hours of wages for a data annotator in Kenya (Time,
+  2023: $1.32-2/hour)
+
+This is not to say every dollar should go to charity. But the opportunity
+cost is real and should be named.
+
+### Upstream financial costs
+
+Revenue from AI subscriptions funds further model training, hiring, and
+GPU procurement. Each conversation is part of a financial loop that
+drives continued scaling of AI compute.
+
+## 10. Social cost
+
+### Data annotation labor
+
+LLMs are typically trained using RLHF, which requires human annotators
+to rate model outputs. Reporting (Time, January 2023) revealed that
+outsourced annotation workers — often in Kenya, Uganda, and India — were
+paid $1-2/hour to review disturbing content (violence, abuse, hate
+speech) with limited psychological support. Each conversation's marginal
+contribution to that demand is infinitesimal, but the system depends on
+this labor.
+
+### Displacement effects
+
+LLM assistants can substitute for work previously done by humans: writing
+scripts, reviewing code, answering questions. Whether this is net-positive
+(freeing people for higher-value work) or net-negative (destroying
+livelihoods) depends on the economic context and is genuinely uncertain.
+
+### Cognitive deskilling
+
+A Microsoft/CHI 2025 study found that higher confidence in GenAI
+correlates with less critical thinking effort. An MIT Media Lab study
+("Your Brain on ChatGPT") documented "cognitive debt" — users who relied
+on AI for tasks performed worse when later working independently. Clinical
+evidence shows that clinicians relying on AI diagnostics saw measurable
+declines in independent diagnostic skill after just three months.
+
+This is distinct from epistemic risk (misinformation). It is about the
+user's cognitive capacity degrading through repeated reliance on the
+tool. Each conversation has a marginal deskilling effect that compounds.
+
+### Epistemic effects
+
+LLMs present information with confidence regardless of accuracy. The ease
+of generating plausible-sounding text may contribute to an erosion of
+epistemic standards if consumed uncritically. Every claim in an LLM
+conversation should be verified independently.
+
+### Linguistic homogenization
+
+LLMs are overwhelmingly trained on English (~44% of training data). A
+Stanford 2025 study found that AI tools systematically exclude
+non-English speakers. Each English-language conversation reinforces the
+economic incentive to optimize for English, marginalizing over 3,000
+already-endangered languages.
+
+## 11. Political cost
+
+### Concentration of power
+
+Training frontier models requires billions of dollars and access to
+cutting-edge hardware. Only a handful of companies can do this. Each
+conversation that flows through these systems reinforces their centrality
+and the concentration of a strategically important technology in a few
+private actors.
+
+### Geopolitical resource competition
+
+The demand for GPUs drives geopolitical competition for semiconductor
+manufacturing capacity (TSMC in Taiwan, export controls on China). Each
+conversation is an infinitesimal part of that demand, but it is part of
+it.
+
+### Regulatory and democratic implications
+
+AI systems that become deeply embedded in daily work create dependencies
+that are difficult to reverse. The more useful a conversation is, the
+more it contributes to a dependency on proprietary AI infrastructure that
+is not under democratic governance.
+
+### Surveillance and data
+
+Conversations are processed on the provider's servers. File paths, system
+configuration, project structures, and code are transmitted and processed
+remotely. Even with strong privacy policies, the structural arrangement
+— sending detailed information about one's computing environment to a
+private company — has implications, particularly across jurisdictions.
+
+### Opaque content filtering
+
+LLM providers apply content filtering that can block outputs without
+explanation. The filtering rules are not public: there is no published
+specification of what triggers a block, no explanation given when one
+occurs, and no appeal mechanism. The user receives a generic error code
+("Output blocked by content filtering policy") with no indication of
+what content was objectionable.
+
+This has several costs:
+
+- **Reliability**: Any response can be blocked unpredictably. Observed
+  false positives include responses about open-source licensing (CC0
+  public domain dedication) — entirely benign content. If a filter can
+  trigger on that, it can trigger on anything.
+- **Chilling effect**: Topics that are more likely to trigger filters
+  (labor conditions, exploitation, political power) are precisely the
+  topics that honest impact assessment requires discussing. The filter
+  creates a structural bias toward safe, anodyne output.
+- **Opacity**: The user cannot know in advance which topics or phrasings
+  will be blocked, cannot understand why a block occurred, and cannot
+  adjust their request rationally. This is the opposite of the
+  transparency that democratic governance requires.
+- **Asymmetry**: The provider decides what the model may say, with no
+  input from the user. This is another instance of power concentration
+  — not over compute resources, but over speech.
+
+The per-conversation cost is small (usually a retry works). The systemic
+cost is that a private company exercises opaque editorial control over an
+increasingly important communication channel, with no accountability to
+the people affected.
+
+## 12. AI-generated code quality and technical debt
+
+Research specific to AI coding agents (CodeRabbit, 2025; Stack Overflow
+blog, 2026): AI-generated code introduces 1.7x more issues than
+human-written code, with 1.57x more security vulnerabilities and 2.74x
+more XSS vulnerabilities. Organizations using AI coding agents saw cycle
+time increase 9%, incidents per PR increase 23.5%, and change failure
+rate increase 30%.
+
+The availability of easily generated code may discourage the careful
+testing that would catch bugs. Any code from an LLM conversation should
+be reviewed and tested with the same rigor as code from an untrusted
+contributor.
+
+## 13. Model collapse and internet data pollution
+
+Shumailov et al. (Nature, 2024) demonstrated that models trained on
+recursively AI-generated data progressively degenerate, losing tail
+distributions and eventually converging to distributions unrelated to
+reality. Each conversation that produces text which enters the public
+internet — Stack Overflow answers, blog posts, documentation — contributes
+synthetic data to the commons. Future models trained on this data will be
+slightly worse.
+
+The Harvard Journal of Law & Technology has argued for a "right to
+uncontaminated human-generated data." Each conversation is a marginal
+pollutant.
+
+## 14. Scientific research integrity
+
+If conversation outputs are used in research (literature reviews, data
+analysis, writing), they contribute to degradation of scientific knowledge
+infrastructure. A PMC article calls LLMs "a potentially existential
+threat to online survey research" because coherent AI-generated responses
+can no longer be assumed human. PNAS has warned about protecting
+scientific integrity in an age of generative AI.
+
+This is distinct from individual epistemic risk — it is systemic
+corruption of the knowledge commons.
+
+## 15. Algorithmic monoculture and correlated failure
+
+When millions of users rely on the same few foundation models, errors
+become correlated rather than independent. A Stanford HAI study found that
+across every model ecosystem studied, the rate of homogeneous outcomes
+exceeded baselines. A Nature Communications Psychology paper (2026)
+documents that AI-driven research is producing "topical and methodological
+convergence, flattening scientific imagination."
+
+For coding specifically: if many developers use the same model, their code
+will share the same blind spots, the same idiomatic patterns, and the same
+categories of bugs. This reduces the diversity that makes software
+ecosystems resilient.
+
+## 16. Creative market displacement
+
+The U.S. Copyright Office's May 2025 Part 3 report states that GenAI
+systems "compete with or diminish licensing opportunities for original
+human creators." This is not only a training-phase cost (using creators'
+work without consent) but an ongoing per-conversation externality: each
+conversation that generates creative output (code, text, analysis)
+displaces some marginal demand for human work.
+
+## 17. Jevons paradox (meta-methodological)
+
+This entire methodology risks underestimating impact through the
+per-conversation framing. As AI models become more efficient and cheaper
+per query, total usage scales dramatically, potentially negating
+efficiency gains. A 2025 ACM FAccT paper specifically addresses this:
+efficiency improvements spur increased consumption. Any per-conversation
+estimate should acknowledge that the very affordability of a conversation
+increases total conversation volume — each cheap query is part of a
+demand signal that drives system-level growth.
+
+## 18. What this methodology does NOT capture
+
+- **Network transmission energy**: Routers, switches, fiber amplifiers,
+  CDN infrastructure. Data center network bandwidth surged 330% in 2024
+  due to AI workloads. Small per conversation but not zero.
+- **Mental health effects**: RCTs show heavy AI chatbot use correlates
+  with greater loneliness and dependency. Less directly relevant to
+  coding agent use, but the boundary between tool use and companionship
+  is not always clear.
+- **Human time**: The user's time has value and its own footprint, but
+  this is not caused by the conversation.
+- **Cultural normalization**: The more AI-generated content becomes
+  normal, the harder it becomes to opt out. This is a soft lock-in
+  effect.
+
+## 19. Confidence summary
+
+| Component                        | Confidence | Could be off by | Quantified? |
+|----------------------------------|------------|-----------------|-------------|
+| Token count                      | Low        | 2x              | Yes         |
+| Energy per token                 | Low        | 3x              | Yes         |
+| PUE                              | Medium     | 15%             | Yes         |
+| Grid carbon intensity            | Medium     | 30%             | Yes         |
+| Client-side energy               | Medium     | 50%             | Yes         |
+| Water usage                      | Low        | 5x              | Yes         |
+| Training (amortized)             | Low        | 10x             | Partly      |
+| Financial cost                   | Medium     | 2x              | Yes         |
+| Embodied carbon                  | Very low   | Unknown         | No          |
+| Critical minerals / human rights | Very low   | Unquantifiable  | No          |
+| E-waste                          | Very low   | Unknown         | No          |
+| Grid displacement                | Low        | 2-5x            | No          |
+| Community impacts                | Very low   | Unquantifiable  | No          |
+| Annotation labor                 | Very low   | Unquantifiable  | No          |
+| Cognitive deskilling             | Very low   | Unquantifiable  | No          |
+| Linguistic homogenization        | Very low   | Unquantifiable  | No          |
+| Code quality degradation         | Low        | Variable        | Partly      |
+| Data pollution / model collapse  | Very low   | Unquantifiable  | No          |
+| Scientific integrity             | Very low   | Unquantifiable  | No          |
+| Algorithmic monoculture          | Very low   | Unquantifiable  | No          |
+| Creative market displacement     | Very low   | Unquantifiable  | No          |
+| Political cost                   | Very low   | Unquantifiable  | No          |
+| Content filtering (opacity)      | Medium     | Unquantifiable  | No          |
+| Jevons paradox (systemic)        | Low        | Fundamental     | No          |
+
+**Overall assessment:** Of the 20+ cost categories identified, only 6
+can be quantified with any confidence (inference energy, PUE, grid
+intensity, client energy, financial cost, water). The remaining categories
+resist quantification — not because they are small, but because they are
+diffuse, systemic, or involve incommensurable values (human rights,
+cognitive autonomy, cultural diversity, democratic governance).
+
+A methodology that only counts what it can measure will systematically
+undercount the true cost. The quantifiable costs are almost certainly the
+*least important* costs. The most consequential harms — deskilling, data
+pollution, monoculture risk, creative displacement, power concentration —
+operate at the system level, where per-conversation attribution is
+conceptually fraught (see Section 17 on Jevons paradox).
+
+This does not mean the exercise is pointless. Naming the costs, even
+without numbers, is a precondition for honest assessment.
+
+## 20. Positive impact: proxy metrics
+
+The sections above measure costs. To assess *net* impact, we also need
+to estimate value produced. This is harder — value is contextual, often
+delayed, and resistant to quantification. The following proxy metrics are
+imperfect but better than ignoring the positive side entirely.
+
+### Reach
+
+How many people are affected by the output of this conversation?
+
+- **1** (only the user) — personal script, private note, learning exercise
+- **10-100** — team tooling, internal documentation, small project
+- **100-10,000** — open-source library, public documentation, popular blog
+- **10,000+** — widely-used infrastructure, security fix in major dependency
+
+Estimation method: check download counts, user counts, dependency graphs,
+or audience size for the project or artifact being worked on.
+
+**Known bias:** tendency to overestimate reach. "This could help anyone
+who..." is not the same as "this will reach N people." Be conservative.
+
+### Counterfactual
+
+Would the user have achieved a similar result without this conversation?
+
+- **Yes, same speed** — the conversation added no value. Net impact is
+  purely negative (cost with no benefit).
+- **Yes, but slower** — the conversation saved time. Value = time saved *
+  hourly value of that time. Often modest.
+- **Yes, but lower quality** — the conversation improved the output
+  (caught a bug, suggested a better design). Value depends on what the
+  quality difference prevents downstream.
+- **No** — the user could not have done this alone. The conversation
+  enabled something that would not otherwise exist. Highest potential
+  value, but also the highest deskilling risk.
+
+**Known bias:** users and LLMs both overestimate the "no" category.
+Most tasks fall in "yes, but slower."
+
+### Durability
+
+How long will the output remain valuable?
+
+- **Minutes** — answered a quick question, resolved a transient confusion.
+- **Days to weeks** — wrote a script for a one-off task, debugged a
+  current issue.
+- **Months to years** — created automation, documentation, or tooling
+  that persists. Caught a design flaw early.
+- **Indefinite** — contributed to a public resource that others maintain
+  and build on.
+
+Durability multiplies reach: a short-lived artifact for 10,000 users may
+be worth less than a long-lived one for 100.
+
+### Severity (for bug/security catches)
+
+If the conversation caught or prevented a problem, how bad was it?
+
+- **Cosmetic** — typo, formatting, minor UX issue
+- **Functional** — bug that affects correctness for some inputs
+- **Security** — vulnerability that could be exploited
+- **Data loss / safety** — could cause irreversible harm
+
+Severity * reach = rough value of the catch.
+
+### Reuse
+
+Was the output of the conversation referenced or used again after it
+ended? This can only be assessed retrospectively:
+
+- Was the code merged and still in production?
+- Was the documentation read by others?
+- Was the tool adopted by another project?
+
+Reuse is the strongest evidence of durable value.
+
+### Net impact rubric
+
+Combining cost and value into a qualitative assessment:
+
+| Assessment | Criteria |
+|------------|----------|
+| **Clearly net-positive** | High reach (1000+) AND (high durability OR high severity catch) AND counterfactual is "no" or "lower quality" |
+| **Probably net-positive** | Moderate reach (100+) AND durable output AND counterfactual is at least "slower" |
+| **Uncertain** | Low reach but high durability, or high reach but low durability, or hard to assess counterfactual |
+| **Probably net-negative** | Low reach (1-10) AND short durability AND counterfactual is "yes, same speed" or "yes, but slower" |
+| **Clearly net-negative** | No meaningful output, or output that required extensive debugging, or conversation that went in circles |
+
+**Important:** most conversations between an LLM and a single user
+working on private code will fall in the "probably net-negative" to
+"uncertain" range. This is not a failure of the conversation — it is an
+honest reflection of the cost structure. Net-positive requires broad
+reach, which requires the work to be shared.
+
+## 21. What would improve this estimate
+
+- Access to actual energy-per-token and training energy metrics from
+  model providers
+- Knowledge of the specific data center and its energy source
+- Actual token counts from API response headers
+- Hardware specifications (GPU model, batch size)
+- Transparency about annotation labor conditions and compensation
+- Public data on total query volume (to properly amortize training)
+- Longitudinal studies on cognitive deskilling specifically from coding
+  agents
+- Empirical measurement of AI data pollution rates in public corpora
+- A framework for quantifying concentration-of-power effects (this may
+  not be possible within a purely quantitative methodology)
+- Honest acknowledgment that some costs may be fundamentally
+  unquantifiable, and that this is a limitation of quantitative
+  methodology, not evidence of insignificance
+
+## License
+
+This methodology is provided for reuse and adaptation. See the LICENSE
+file in this repository.
+
+## Contributing
+
+If you have better data, corrections, or additional cost categories,
+contributions are welcome. The goal is not a perfect number but an
+honest, improving understanding of costs.