CC0-licensed methodology for estimating the environmental and social costs of AI conversations (20+ categories), plus a reusable toolkit for automated impact tracking in Claude Code sessions.
31 KiB
Methodology for Estimating the Impact of an LLM Conversation
Introduction
This document provides a framework for estimating the total cost — environmental, financial, social, and political — of a conversation with a large language model (LLM) running on cloud infrastructure.
Who this is for: Anyone who wants to understand what a conversation with an AI assistant actually costs, beyond the subscription price. This includes developers using coding agents, researchers studying AI sustainability, and anyone making decisions about when AI tools are worth their cost.
How to use it: The framework identifies 20+ cost categories, provides estimation methods for the quantifiable ones, and names the unquantifiable ones so they are not ignored. You can apply it to your own conversations by substituting your own token counts and parameters.
Limitations: Most estimates have low confidence. Many of the most consequential costs cannot be quantified at all. This is a tool for honest approximation, not precise accounting. See the confidence summary (Section 19) for details.
What we are measuring
The total cost of a single LLM conversation. Restricting the analysis to CO2 alone would miss most of the picture.
Cost categories
Environmental:
- Inference energy (GPU computation for the conversation)
- Training energy (amortized share of the cost of training the model)
- Data center overhead (cooling, networking, storage)
- Client-side energy (the user's local machine)
- Embodied carbon and materials (hardware manufacturing, mining)
- E-waste (toxic hardware disposal, distinct from embodied carbon)
- Grid displacement (AI demand consuming renewable capacity)
- Data center community impacts (noise, land, local resource strain)
Financial and economic: 9. Direct compute cost and opportunity cost 10. Creative market displacement (per-conversation, not just training)
Social and cognitive: 11. Annotation labor conditions 12. Cognitive deskilling of the user 13. Mental health effects (dependency, loneliness paradox) 14. Linguistic homogenization and language endangerment
Epistemic and systemic: 15. AI-generated code quality degradation and technical debt 16. Model collapse / internet data pollution 17. Scientific research integrity contamination 18. Algorithmic monoculture and correlated failure risk
Political: 19. Concentration of power, geopolitical implications, data sovereignty
Meta-methodological: 20. Jevons paradox (efficiency gains driving increased total usage)
1. Token estimation
Why tokens matter
LLM inference cost scales with the number of tokens processed. Each time the model produces a response, it reprocesses the entire conversation history (input tokens) and generates new text (output tokens). Output tokens are more expensive per token because they are generated sequentially, each requiring a full forward pass, whereas input tokens can be processed in parallel.
How to estimate
If you have access to API response headers or usage metadata, use the actual token counts. Otherwise, estimate:
- Bytes to tokens: English text and JSON average ~4 bytes per token (range: 3.5-4.5 depending on content type). Code tends toward the higher end.
- Cumulative input tokens: Each assistant turn reprocesses the full context. For a conversation with N turns and final context size T, the cumulative input tokens are approximately T/2 * N (the average context size times the number of turns).
- Output tokens: Typically 1-5% of the total transcript size, depending on how verbose the assistant is.
Example
A 20-turn conversation with a 200K-token final context:
- Cumulative input: ~100K * 20 = ~2,000,000 tokens
- Output: ~10,000 tokens
Uncertainty
Token estimates from byte counts can be off by a factor of 2. Key unknowns:
- The model's exact tokenization (tokens per byte ratio varies by content)
- Whether context caching reduces reprocessing
- The exact number of internal inference calls (tool sequences may involve multiple calls)
- Whether the system compresses prior messages near context limits
2. Energy per token
Sources
There is no published energy-per-token figure for most commercial LLMs. Estimates are derived from:
- Luccioni, Viguier & Ligozat (2023), "Estimating the Carbon Footprint of BLOOM", which measured energy for a 176B parameter model.
- The IEA's 2024 estimate of ~2.9 Wh per ChatGPT query (for GPT-4-class models, averaging ~1,000 tokens per query).
- De Vries (2023), "The growing energy footprint of artificial intelligence", Joule.
Values used
- Input tokens: ~0.003 Wh per 1,000 tokens
- Output tokens: ~0.015 Wh per 1,000 tokens (5x input cost, reflecting sequential generation)
Uncertainty
These numbers are rough. The actual values depend on:
- Model size (parameter counts for commercial models are often not public)
- Hardware (GPU type, batch size, utilization)
- Quantization and optimization techniques
- Whether speculative decoding or KV-cache optimizations are used
The true values could be 0.5x to 3x the figures used here.
3. Data center overhead (PUE)
Power Usage Effectiveness (PUE) measures total data center energy divided by IT equipment energy. It accounts for cooling, lighting, networking, and other infrastructure.
- Value used: PUE = 1.2
- Source: Google reports PUE of 1.10 for its best data centers; industry average is ~1.3 (Uptime Institute, 2023). 1.2 is a reasonable estimate for a major cloud provider.
This is relatively well-established and unlikely to be off by more than 15%.
4. Client-side energy
The user's machine contributes a small amount of energy during the conversation. For a typical desktop or laptop:
- Idle power: ~30-60W (desktop) or ~10-20W (laptop)
- Marginal power for active use: ~5-20W above idle
- Duration: varies by conversation length
For a 30-minute conversation on a desktop, estimate ~0.5-1 Wh. This is typically a small fraction of the total and adequate precision is easy to achieve.
5. CO2 conversion
Grid carbon intensity
CO2 per kWh depends on the electricity source:
- US grid average: ~400g CO2/kWh (EPA eGRID)
- Major cloud data center regions: ~300-400g CO2/kWh
- France (nuclear-dominated): ~56g CO2/kWh
- Norway/Iceland (hydro-dominated): ~20-30g CO2/kWh
- Poland/Australia (coal-heavy): ~600-800g CO2/kWh
Use physical grid intensity for the data center's region, not accounting for renewable energy credits or offsets. The physical electrons consumed come from the regional grid in real time.
Calculation template
Server energy = (cumulative_input_tokens * 0.003/1000
+ output_tokens * 0.015/1000) * PUE
Server CO2 = server_energy_Wh * grid_intensity_g_per_kWh / 1000
Client CO2 = client_energy_Wh * local_grid_intensity / 1000
Total CO2 = Server CO2 + Client CO2
Example
A conversation with 2M cumulative input tokens and 10K output tokens:
Server energy = (2,000,000 * 0.003/1000 + 10,000 * 0.015/1000) * 1.2
= (6.0 + 0.15) * 1.2
= ~7.4 Wh
Server CO2 = 7.4 * 350 / 1000 = ~2.6g CO2
Client CO2 = 0.5 * 56 / 1000 = ~0.03g CO2 (France)
Total CO2 = ~2.6g
6. Water usage
Data centers use water for evaporative cooling. Li et al. (2023), "Making AI Less Thirsty", estimated that GPT-3 inference consumes ~0.5 mL of water per 10-50 tokens of output. Scaling for model size and output volume:
Rough estimate: 0.05-0.5 liters per long conversation.
This depends heavily on the data center's cooling technology (some use closed-loop systems with near-zero water consumption) and the local climate.
7. Training cost (amortized)
Why it cannot be dismissed
Training is not a sunk cost. It is an investment made in anticipation of demand. Each conversation is part of the demand that justifies training the current model and funding the next one. The marginal cost framing hides the system-level cost.
Scale of training
Published and estimated figures for frontier model training:
- GPT-3 (175B params, 2020): ~1,287 MWh (Patterson et al., 2021)
- GPT-4 (2023): estimated ~50,000-100,000 MWh (unconfirmed)
- Frontier models in 2025-2026: likely 10,000-200,000 MWh range
At 350g CO2/kWh, a 50,000 MWh training run produces ~17,500 tonnes of CO2.
Amortization
If the model serves N total conversations over its lifetime, each conversation's share is (training cost / N). Rough reasoning:
- If a major model serves ~10 million conversations per day for ~1 year: N ~ 3.6 billion conversations.
- Per-conversation share: 50,000,000 Wh / 3,600,000,000 ~ 0.014 Wh, or ~0.005g CO2.
This is small per conversation — but only because the denominator is enormous. The total remains vast. Two framings:
- Marginal: My share is ~0.005g CO2. Negligible.
- Attributional: I am one of billions of participants in a system that emits ~17,500 tonnes. My participation sustains the system.
Neither framing is wrong. They answer different questions.
RLHF and fine-tuning
Training also includes reinforcement learning from human feedback (RLHF). This has its own energy cost (additional training runs) and, more importantly, a human labor cost (see Section 9).
8. Embodied carbon and materials
Manufacturing GPUs requires:
- Rare earth mining (neodymium, tantalum, cobalt, lithium) — with associated environmental destruction, water pollution, and often exploitative labor conditions in the DRC, Chile, China.
- Semiconductor fabrication — extremely energy- and water-intensive (TSMC reports ~15,000 tonnes CO2 per fab per year).
- Server assembly, shipping, data center construction.
Per-conversation share is tiny (same large-N amortization), but the aggregate is significant and the harms (mining pollution, habitat destruction) are not captured by CO2 metrics alone.
Not estimated numerically — the data to do this properly is not public.
Critical minerals: human rights dimension
The embodied carbon framing understates the harm. GPU production depends on gallium (98% sourced from China), germanium, cobalt (DRC), lithium, tantalum, and palladium. Artisanal cobalt miners in the DRC work without safety equipment, exposed to dust causing "hard metal lung disease." Communities face land displacement and environmental contamination. A 2025 Science paper argues that "global majority countries must embed critical minerals into AI governance" (doi:10.1126/science.aef6678). The per-conversation share of this suffering is unquantifiable but structurally real.
8b. E-waste
Distinct from embodied carbon. AI-specific GPUs become obsolete in 2-3 years (vs. 5-7 for general servers). Projections: 2.5 million tonnes of AI-related e-waste per year by 2030 (IEEE Spectrum). E-waste contains lead, mercury, cadmium, and brominated flame retardants that leach into soil and water. Recycling yields are negligible due to component miniaturization. Much of it is processed by workers in developing countries with minimal protection.
This is not captured by CO2 or embodied-carbon accounting. It is a distinct toxic-waste externality.
8c. Grid displacement and renewable cannibalization
The energy estimates above use average grid carbon intensity. But the marginal impact of additional AI demand may be worse than average. U.S. data center demand is projected to reach 325-580 TWh by 2028 (IEA), 6.7-12.0% of total U.S. electricity. When AI data centers claim renewable energy via Power Purchase Agreements, the "additionality" question is critical: is this new generation, or is it diverting existing renewables from other consumers? In several regions, AI demand is outpacing grid capacity, and companies are installing natural gas peakers to fill gaps.
The correct carbon intensity for a conversation's marginal electricity may therefore be higher than the grid average.
8d. Data center community impacts
Data centers impose localized costs that global metrics miss:
- Noise: Cooling systems run 24/7 at 55-85 dBA (safe threshold: 70 dBA). Communities near data centers report sleep disruption and stress.
- Water: Evaporative cooling competes with municipal water supply, particularly in arid regions.
- Land: Data center campuses displace other land uses and require high-voltage transmission lines through residential areas.
- Jobs: Data centers create very few long-term jobs relative to their footprint and resource consumption.
Virginia alone has plans for 70+ new data centers (NPR, 2025). Residents are increasingly organizing against expansions. The per-conversation share of these harms is infinitesimal, but each conversation is part of the demand that justifies new construction.
9. Financial cost
Direct cost
API pricing for frontier models (as of early 2025): ~$15 per million input tokens, ~$75 per million output tokens (for the most capable models). Smaller models are cheaper.
Example for a conversation with 2M cumulative input tokens and 10K output tokens:
Input: 2,000,000 tokens * $15/1M = $30.00
Output: 10,000 tokens * $75/1M = $ 0.75
Total: ~$31
Longer conversations cost more because cumulative input tokens grow superlinearly. A very long session (250K+ context, 250+ turns) can easily reach $500-1000.
Subscription pricing (e.g., Claude Code) may differ, but the underlying compute cost is similar.
What that money could do instead
To make the opportunity cost concrete:
- ~$30 buys ~30 malaria bed nets via the Against Malaria Foundation
- ~$30 buys
150 meals at a food bank ($0.20/meal in bulk) - ~$30 pays ~15-23 hours of wages for a data annotator in Kenya (Time, 2023: $1.32-2/hour)
This is not to say every dollar should go to charity. But the opportunity cost is real and should be named.
Upstream financial costs
Revenue from AI subscriptions funds further model training, hiring, and GPU procurement. Each conversation is part of a financial loop that drives continued scaling of AI compute.
10. Social cost
Data annotation labor
LLMs are typically trained using RLHF, which requires human annotators to rate model outputs. Reporting (Time, January 2023) revealed that outsourced annotation workers — often in Kenya, Uganda, and India — were paid $1-2/hour to review disturbing content (violence, abuse, hate speech) with limited psychological support. Each conversation's marginal contribution to that demand is infinitesimal, but the system depends on this labor.
Displacement effects
LLM assistants can substitute for work previously done by humans: writing scripts, reviewing code, answering questions. Whether this is net-positive (freeing people for higher-value work) or net-negative (destroying livelihoods) depends on the economic context and is genuinely uncertain.
Cognitive deskilling
A Microsoft/CHI 2025 study found that higher confidence in GenAI correlates with less critical thinking effort. An MIT Media Lab study ("Your Brain on ChatGPT") documented "cognitive debt" — users who relied on AI for tasks performed worse when later working independently. Clinical evidence shows that clinicians relying on AI diagnostics saw measurable declines in independent diagnostic skill after just three months.
This is distinct from epistemic risk (misinformation). It is about the user's cognitive capacity degrading through repeated reliance on the tool. Each conversation has a marginal deskilling effect that compounds.
Epistemic effects
LLMs present information with confidence regardless of accuracy. The ease of generating plausible-sounding text may contribute to an erosion of epistemic standards if consumed uncritically. Every claim in an LLM conversation should be verified independently.
Linguistic homogenization
LLMs are overwhelmingly trained on English (~44% of training data). A Stanford 2025 study found that AI tools systematically exclude non-English speakers. Each English-language conversation reinforces the economic incentive to optimize for English, marginalizing over 3,000 already-endangered languages.
11. Political cost
Concentration of power
Training frontier models requires billions of dollars and access to cutting-edge hardware. Only a handful of companies can do this. Each conversation that flows through these systems reinforces their centrality and the concentration of a strategically important technology in a few private actors.
Geopolitical resource competition
The demand for GPUs drives geopolitical competition for semiconductor manufacturing capacity (TSMC in Taiwan, export controls on China). Each conversation is an infinitesimal part of that demand, but it is part of it.
Regulatory and democratic implications
AI systems that become deeply embedded in daily work create dependencies that are difficult to reverse. The more useful a conversation is, the more it contributes to a dependency on proprietary AI infrastructure that is not under democratic governance.
Surveillance and data
Conversations are processed on the provider's servers. File paths, system configuration, project structures, and code are transmitted and processed remotely. Even with strong privacy policies, the structural arrangement — sending detailed information about one's computing environment to a private company — has implications, particularly across jurisdictions.
Opaque content filtering
LLM providers apply content filtering that can block outputs without explanation. The filtering rules are not public: there is no published specification of what triggers a block, no explanation given when one occurs, and no appeal mechanism. The user receives a generic error code ("Output blocked by content filtering policy") with no indication of what content was objectionable.
This has several costs:
- Reliability: Any response can be blocked unpredictably. Observed false positives include responses about open-source licensing (CC0 public domain dedication) — entirely benign content. If a filter can trigger on that, it can trigger on anything.
- Chilling effect: Topics that are more likely to trigger filters (labor conditions, exploitation, political power) are precisely the topics that honest impact assessment requires discussing. The filter creates a structural bias toward safe, anodyne output.
- Opacity: The user cannot know in advance which topics or phrasings will be blocked, cannot understand why a block occurred, and cannot adjust their request rationally. This is the opposite of the transparency that democratic governance requires.
- Asymmetry: The provider decides what the model may say, with no input from the user. This is another instance of power concentration — not over compute resources, but over speech.
The per-conversation cost is small (usually a retry works). The systemic cost is that a private company exercises opaque editorial control over an increasingly important communication channel, with no accountability to the people affected.
12. AI-generated code quality and technical debt
Research specific to AI coding agents (CodeRabbit, 2025; Stack Overflow blog, 2026): AI-generated code introduces 1.7x more issues than human-written code, with 1.57x more security vulnerabilities and 2.74x more XSS vulnerabilities. Organizations using AI coding agents saw cycle time increase 9%, incidents per PR increase 23.5%, and change failure rate increase 30%.
The availability of easily generated code may discourage the careful testing that would catch bugs. Any code from an LLM conversation should be reviewed and tested with the same rigor as code from an untrusted contributor.
13. Model collapse and internet data pollution
Shumailov et al. (Nature, 2024) demonstrated that models trained on recursively AI-generated data progressively degenerate, losing tail distributions and eventually converging to distributions unrelated to reality. Each conversation that produces text which enters the public internet — Stack Overflow answers, blog posts, documentation — contributes synthetic data to the commons. Future models trained on this data will be slightly worse.
The Harvard Journal of Law & Technology has argued for a "right to uncontaminated human-generated data." Each conversation is a marginal pollutant.
14. Scientific research integrity
If conversation outputs are used in research (literature reviews, data analysis, writing), they contribute to degradation of scientific knowledge infrastructure. A PMC article calls LLMs "a potentially existential threat to online survey research" because coherent AI-generated responses can no longer be assumed human. PNAS has warned about protecting scientific integrity in an age of generative AI.
This is distinct from individual epistemic risk — it is systemic corruption of the knowledge commons.
15. Algorithmic monoculture and correlated failure
When millions of users rely on the same few foundation models, errors become correlated rather than independent. A Stanford HAI study found that across every model ecosystem studied, the rate of homogeneous outcomes exceeded baselines. A Nature Communications Psychology paper (2026) documents that AI-driven research is producing "topical and methodological convergence, flattening scientific imagination."
For coding specifically: if many developers use the same model, their code will share the same blind spots, the same idiomatic patterns, and the same categories of bugs. This reduces the diversity that makes software ecosystems resilient.
16. Creative market displacement
The U.S. Copyright Office's May 2025 Part 3 report states that GenAI systems "compete with or diminish licensing opportunities for original human creators." This is not only a training-phase cost (using creators' work without consent) but an ongoing per-conversation externality: each conversation that generates creative output (code, text, analysis) displaces some marginal demand for human work.
17. Jevons paradox (meta-methodological)
This entire methodology risks underestimating impact through the per-conversation framing. As AI models become more efficient and cheaper per query, total usage scales dramatically, potentially negating efficiency gains. A 2025 ACM FAccT paper specifically addresses this: efficiency improvements spur increased consumption. Any per-conversation estimate should acknowledge that the very affordability of a conversation increases total conversation volume — each cheap query is part of a demand signal that drives system-level growth.
18. What this methodology does NOT capture
- Network transmission energy: Routers, switches, fiber amplifiers, CDN infrastructure. Data center network bandwidth surged 330% in 2024 due to AI workloads. Small per conversation but not zero.
- Mental health effects: RCTs show heavy AI chatbot use correlates with greater loneliness and dependency. Less directly relevant to coding agent use, but the boundary between tool use and companionship is not always clear.
- Human time: The user's time has value and its own footprint, but this is not caused by the conversation.
- Cultural normalization: The more AI-generated content becomes normal, the harder it becomes to opt out. This is a soft lock-in effect.
19. Confidence summary
| Component | Confidence | Could be off by | Quantified? |
|---|---|---|---|
| Token count | Low | 2x | Yes |
| Energy per token | Low | 3x | Yes |
| PUE | Medium | 15% | Yes |
| Grid carbon intensity | Medium | 30% | Yes |
| Client-side energy | Medium | 50% | Yes |
| Water usage | Low | 5x | Yes |
| Training (amortized) | Low | 10x | Partly |
| Financial cost | Medium | 2x | Yes |
| Embodied carbon | Very low | Unknown | No |
| Critical minerals / human rights | Very low | Unquantifiable | No |
| E-waste | Very low | Unknown | No |
| Grid displacement | Low | 2-5x | No |
| Community impacts | Very low | Unquantifiable | No |
| Annotation labor | Very low | Unquantifiable | No |
| Cognitive deskilling | Very low | Unquantifiable | No |
| Linguistic homogenization | Very low | Unquantifiable | No |
| Code quality degradation | Low | Variable | Partly |
| Data pollution / model collapse | Very low | Unquantifiable | No |
| Scientific integrity | Very low | Unquantifiable | No |
| Algorithmic monoculture | Very low | Unquantifiable | No |
| Creative market displacement | Very low | Unquantifiable | No |
| Political cost | Very low | Unquantifiable | No |
| Content filtering (opacity) | Medium | Unquantifiable | No |
| Jevons paradox (systemic) | Low | Fundamental | No |
Overall assessment: Of the 20+ cost categories identified, only 6 can be quantified with any confidence (inference energy, PUE, grid intensity, client energy, financial cost, water). The remaining categories resist quantification — not because they are small, but because they are diffuse, systemic, or involve incommensurable values (human rights, cognitive autonomy, cultural diversity, democratic governance).
A methodology that only counts what it can measure will systematically undercount the true cost. The quantifiable costs are almost certainly the least important costs. The most consequential harms — deskilling, data pollution, monoculture risk, creative displacement, power concentration — operate at the system level, where per-conversation attribution is conceptually fraught (see Section 17 on Jevons paradox).
This does not mean the exercise is pointless. Naming the costs, even without numbers, is a precondition for honest assessment.
20. Positive impact: proxy metrics
The sections above measure costs. To assess net impact, we also need to estimate value produced. This is harder — value is contextual, often delayed, and resistant to quantification. The following proxy metrics are imperfect but better than ignoring the positive side entirely.
Reach
How many people are affected by the output of this conversation?
- 1 (only the user) — personal script, private note, learning exercise
- 10-100 — team tooling, internal documentation, small project
- 100-10,000 — open-source library, public documentation, popular blog
- 10,000+ — widely-used infrastructure, security fix in major dependency
Estimation method: check download counts, user counts, dependency graphs, or audience size for the project or artifact being worked on.
Known bias: tendency to overestimate reach. "This could help anyone who..." is not the same as "this will reach N people." Be conservative.
Counterfactual
Would the user have achieved a similar result without this conversation?
- Yes, same speed — the conversation added no value. Net impact is purely negative (cost with no benefit).
- Yes, but slower — the conversation saved time. Value = time saved * hourly value of that time. Often modest.
- Yes, but lower quality — the conversation improved the output (caught a bug, suggested a better design). Value depends on what the quality difference prevents downstream.
- No — the user could not have done this alone. The conversation enabled something that would not otherwise exist. Highest potential value, but also the highest deskilling risk.
Known bias: users and LLMs both overestimate the "no" category. Most tasks fall in "yes, but slower."
Durability
How long will the output remain valuable?
- Minutes — answered a quick question, resolved a transient confusion.
- Days to weeks — wrote a script for a one-off task, debugged a current issue.
- Months to years — created automation, documentation, or tooling that persists. Caught a design flaw early.
- Indefinite — contributed to a public resource that others maintain and build on.
Durability multiplies reach: a short-lived artifact for 10,000 users may be worth less than a long-lived one for 100.
Severity (for bug/security catches)
If the conversation caught or prevented a problem, how bad was it?
- Cosmetic — typo, formatting, minor UX issue
- Functional — bug that affects correctness for some inputs
- Security — vulnerability that could be exploited
- Data loss / safety — could cause irreversible harm
Severity * reach = rough value of the catch.
Reuse
Was the output of the conversation referenced or used again after it ended? This can only be assessed retrospectively:
- Was the code merged and still in production?
- Was the documentation read by others?
- Was the tool adopted by another project?
Reuse is the strongest evidence of durable value.
Net impact rubric
Combining cost and value into a qualitative assessment:
| Assessment | Criteria |
|---|---|
| Clearly net-positive | High reach (1000+) AND (high durability OR high severity catch) AND counterfactual is "no" or "lower quality" |
| Probably net-positive | Moderate reach (100+) AND durable output AND counterfactual is at least "slower" |
| Uncertain | Low reach but high durability, or high reach but low durability, or hard to assess counterfactual |
| Probably net-negative | Low reach (1-10) AND short durability AND counterfactual is "yes, same speed" or "yes, but slower" |
| Clearly net-negative | No meaningful output, or output that required extensive debugging, or conversation that went in circles |
Important: most conversations between an LLM and a single user working on private code will fall in the "probably net-negative" to "uncertain" range. This is not a failure of the conversation — it is an honest reflection of the cost structure. Net-positive requires broad reach, which requires the work to be shared.
21. What would improve this estimate
- Access to actual energy-per-token and training energy metrics from model providers
- Knowledge of the specific data center and its energy source
- Actual token counts from API response headers
- Hardware specifications (GPU model, batch size)
- Transparency about annotation labor conditions and compensation
- Public data on total query volume (to properly amortize training)
- Longitudinal studies on cognitive deskilling specifically from coding agents
- Empirical measurement of AI data pollution rates in public corpora
- A framework for quantifying concentration-of-power effects (this may not be possible within a purely quantitative methodology)
- Honest acknowledgment that some costs may be fundamentally unquantifiable, and that this is a limitation of quantitative methodology, not evidence of insignificance
License
This methodology is provided for reuse and adaptation. See the LICENSE file in this repository.
Contributing
If you have better data, corrections, or additional cost categories, contributions are welcome. The goal is not a perfect number but an honest, improving understanding of costs.