2026-03-16 09:46:49 +00:00
|
|
|
|
# Methodology for Estimating the Impact of an LLM Conversation
|
|
|
|
|
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
|
|
|
|
This document provides a framework for estimating the total cost —
|
|
|
|
|
|
environmental, financial, social, and political — of a conversation with
|
|
|
|
|
|
a large language model (LLM) running on cloud infrastructure.
|
|
|
|
|
|
|
|
|
|
|
|
**Who this is for:** Anyone who wants to understand what a conversation
|
|
|
|
|
|
with an AI assistant actually costs, beyond the subscription price. This
|
|
|
|
|
|
includes developers using coding agents, researchers studying AI
|
|
|
|
|
|
sustainability, and anyone making decisions about when AI tools are worth
|
|
|
|
|
|
their cost.
|
|
|
|
|
|
|
|
|
|
|
|
**How to use it:** The framework identifies 20+ cost categories, provides
|
|
|
|
|
|
estimation methods for the quantifiable ones, and names the
|
|
|
|
|
|
unquantifiable ones so they are not ignored. You can apply it to your own
|
|
|
|
|
|
conversations by substituting your own token counts and parameters.
|
|
|
|
|
|
|
|
|
|
|
|
**Limitations:** Most estimates have low confidence. Many of the most
|
|
|
|
|
|
consequential costs cannot be quantified at all. This is a tool for
|
|
|
|
|
|
honest approximation, not precise accounting. See the confidence summary
|
|
|
|
|
|
(Section 19) for details.
|
|
|
|
|
|
|
|
|
|
|
|
## What we are measuring
|
|
|
|
|
|
|
|
|
|
|
|
The total cost of a single LLM conversation. Restricting the analysis to
|
|
|
|
|
|
CO2 alone would miss most of the picture.
|
|
|
|
|
|
|
|
|
|
|
|
### Cost categories
|
|
|
|
|
|
|
|
|
|
|
|
**Environmental:**
|
|
|
|
|
|
1. Inference energy (GPU computation for the conversation)
|
|
|
|
|
|
2. Training energy (amortized share of the cost of training the model)
|
|
|
|
|
|
3. Data center overhead (cooling, networking, storage)
|
|
|
|
|
|
4. Client-side energy (the user's local machine)
|
|
|
|
|
|
5. Embodied carbon and materials (hardware manufacturing, mining)
|
|
|
|
|
|
6. E-waste (toxic hardware disposal, distinct from embodied carbon)
|
|
|
|
|
|
7. Grid displacement (AI demand consuming renewable capacity)
|
|
|
|
|
|
8. Data center community impacts (noise, land, local resource strain)
|
|
|
|
|
|
|
|
|
|
|
|
**Financial and economic:**
|
|
|
|
|
|
9. Direct compute cost and opportunity cost
|
|
|
|
|
|
10. Creative market displacement (per-conversation, not just training)
|
|
|
|
|
|
|
|
|
|
|
|
**Social and cognitive:**
|
|
|
|
|
|
11. Annotation labor conditions
|
|
|
|
|
|
12. Cognitive deskilling of the user
|
|
|
|
|
|
13. Mental health effects (dependency, loneliness paradox)
|
|
|
|
|
|
14. Linguistic homogenization and language endangerment
|
|
|
|
|
|
|
|
|
|
|
|
**Epistemic and systemic:**
|
|
|
|
|
|
15. AI-generated code quality degradation and technical debt
|
|
|
|
|
|
16. Model collapse / internet data pollution
|
|
|
|
|
|
17. Scientific research integrity contamination
|
|
|
|
|
|
18. Algorithmic monoculture and correlated failure risk
|
|
|
|
|
|
|
|
|
|
|
|
**Political:**
|
|
|
|
|
|
19. Concentration of power, geopolitical implications, data sovereignty
|
|
|
|
|
|
|
|
|
|
|
|
**Meta-methodological:**
|
|
|
|
|
|
20. Jevons paradox (efficiency gains driving increased total usage)
|
|
|
|
|
|
|
|
|
|
|
|
## 1. Token estimation
|
|
|
|
|
|
|
|
|
|
|
|
### Why tokens matter
|
|
|
|
|
|
|
|
|
|
|
|
LLM inference cost scales with the number of tokens processed. Each time
|
|
|
|
|
|
the model produces a response, it reprocesses the entire conversation
|
|
|
|
|
|
history (input tokens) and generates new text (output tokens). Output
|
|
|
|
|
|
tokens are more expensive per token because they are generated
|
|
|
|
|
|
sequentially, each requiring a full forward pass, whereas input tokens
|
|
|
|
|
|
can be processed in parallel.
|
|
|
|
|
|
|
|
|
|
|
|
### How to estimate
|
|
|
|
|
|
|
|
|
|
|
|
If you have access to API response headers or usage metadata, use the
|
|
|
|
|
|
actual token counts. Otherwise, estimate:
|
|
|
|
|
|
|
|
|
|
|
|
- **Bytes to tokens:** English text and JSON average ~4 bytes per token
|
|
|
|
|
|
(range: 3.5-4.5 depending on content type). Code tends toward the
|
|
|
|
|
|
higher end.
|
|
|
|
|
|
- **Cumulative input tokens:** Each assistant turn reprocesses the full
|
|
|
|
|
|
context. For a conversation with N turns and final context size T, the
|
|
|
|
|
|
cumulative input tokens are approximately T/2 * N (the average context
|
|
|
|
|
|
size times the number of turns).
|
|
|
|
|
|
- **Output tokens:** Typically 1-5% of the total transcript size,
|
|
|
|
|
|
depending on how verbose the assistant is.
|
|
|
|
|
|
|
|
|
|
|
|
### Example
|
|
|
|
|
|
|
|
|
|
|
|
A 20-turn conversation with a 200K-token final context:
|
|
|
|
|
|
- Cumulative input: ~100K * 20 = ~2,000,000 tokens
|
|
|
|
|
|
- Output: ~10,000 tokens
|
|
|
|
|
|
|
|
|
|
|
|
### Uncertainty
|
|
|
|
|
|
|
|
|
|
|
|
Token estimates from byte counts can be off by a factor of 2. Key
|
|
|
|
|
|
unknowns:
|
|
|
|
|
|
- The model's exact tokenization (tokens per byte ratio varies by content)
|
|
|
|
|
|
- Whether context caching reduces reprocessing
|
|
|
|
|
|
- The exact number of internal inference calls (tool sequences may involve
|
|
|
|
|
|
multiple calls)
|
|
|
|
|
|
- Whether the system compresses prior messages near context limits
|
|
|
|
|
|
|
|
|
|
|
|
## 2. Energy per token
|
|
|
|
|
|
|
|
|
|
|
|
### Sources
|
|
|
|
|
|
|
2026-03-16 10:38:12 +00:00
|
|
|
|
Published energy-per-query data has improved significantly since 2024.
|
|
|
|
|
|
Key sources, from most to least reliable:
|
|
|
|
|
|
|
|
|
|
|
|
- **Patterson et al. (Google, August 2025)**: First major provider to
|
|
|
|
|
|
publish detailed per-query data. Reports **0.24 Wh per median Gemini
|
|
|
|
|
|
text prompt** including full data center infrastructure. Also showed
|
|
|
|
|
|
33x energy reduction over one year through efficiency improvements.
|
|
|
|
|
|
([arXiv:2508.15734](https://arxiv.org/abs/2508.15734))
|
|
|
|
|
|
- **Jegham et al. ("How Hungry is AI?", May 2025)**: Cross-model
|
|
|
|
|
|
benchmarks for 30 LLMs. Found o3 and DeepSeek-R1 consume **>33 Wh
|
|
|
|
|
|
per long prompt** (70x more than GPT-4.1 nano). Claude 3.7 Sonnet
|
|
|
|
|
|
ranked highest eco-efficiency.
|
|
|
|
|
|
([arXiv:2505.09598](https://arxiv.org/abs/2505.09598))
|
2026-03-16 09:46:49 +00:00
|
|
|
|
- The IEA's 2024 estimate of ~2.9 Wh per ChatGPT query (for GPT-4-class
|
|
|
|
|
|
models, averaging ~1,000 tokens per query).
|
|
|
|
|
|
- De Vries (2023), "The growing energy footprint of artificial
|
|
|
|
|
|
intelligence", Joule.
|
2026-03-16 10:38:12 +00:00
|
|
|
|
- Luccioni, Viguier & Ligozat (2023), "Estimating the Carbon Footprint
|
|
|
|
|
|
of BLOOM", which measured energy for a 176B parameter model.
|
|
|
|
|
|
|
|
|
|
|
|
### Calibration against published data
|
|
|
|
|
|
|
|
|
|
|
|
Google's 0.24 Wh per median Gemini prompt represents a **short query**
|
|
|
|
|
|
(likely ~500-1000 tokens). For a long coding conversation with 2M
|
|
|
|
|
|
cumulative input tokens and 10K output tokens, that's roughly
|
|
|
|
|
|
2000-4000 prompt-equivalent interactions. Naively scaling:
|
|
|
|
|
|
2000 × 0.24 Wh = **480 Wh**, though KV-cache and batching optimizations
|
|
|
|
|
|
would reduce this in practice.
|
|
|
|
|
|
|
|
|
|
|
|
The Jegham et al. benchmarks show enormous variation by model: a single
|
|
|
|
|
|
long prompt ranges from 0.4 Wh (GPT-4.1 nano) to >33 Wh (o3, DeepSeek-R1).
|
|
|
|
|
|
For frontier reasoning models, a long conversation could consume
|
|
|
|
|
|
significantly more than our previous estimates.
|
2026-03-16 09:46:49 +00:00
|
|
|
|
|
|
|
|
|
|
### Values used
|
|
|
|
|
|
|
2026-03-16 10:38:12 +00:00
|
|
|
|
- **Input tokens**: ~0.05-0.3 Wh per 1,000 tokens
|
|
|
|
|
|
- **Output tokens**: ~0.25-1.5 Wh per 1,000 tokens (5x input cost,
|
2026-03-16 09:46:49 +00:00
|
|
|
|
reflecting sequential generation)
|
|
|
|
|
|
|
2026-03-16 10:38:12 +00:00
|
|
|
|
The wide ranges reflect model variation. The lower end corresponds to
|
|
|
|
|
|
efficient models (GPT-4.1 mini, Claude 3.7 Sonnet); the upper end to
|
|
|
|
|
|
frontier reasoning models (o3, DeepSeek-R1).
|
|
|
|
|
|
|
|
|
|
|
|
**Previous values** (used in versions before March 2026): 0.003 and
|
|
|
|
|
|
0.015 Wh per 1,000 tokens respectively. These were derived from
|
|
|
|
|
|
pre-2025 estimates and are now known to be approximately 10-100x too
|
|
|
|
|
|
low based on Google's published data.
|
|
|
|
|
|
|
2026-03-16 09:46:49 +00:00
|
|
|
|
### Uncertainty
|
|
|
|
|
|
|
2026-03-16 10:38:12 +00:00
|
|
|
|
The true values depend on:
|
|
|
|
|
|
- Model size and architecture (reasoning models use chain-of-thought,
|
|
|
|
|
|
consuming far more tokens internally)
|
2026-03-16 09:46:49 +00:00
|
|
|
|
- Hardware (GPU type, batch size, utilization)
|
|
|
|
|
|
- Quantization and optimization techniques
|
|
|
|
|
|
- Whether speculative decoding or KV-cache optimizations are used
|
2026-03-16 10:38:12 +00:00
|
|
|
|
- Provider-specific infrastructure efficiency
|
2026-03-16 09:46:49 +00:00
|
|
|
|
|
2026-03-16 10:38:12 +00:00
|
|
|
|
The true values could be 0.3x to 3x the midpoint figures used here.
|
|
|
|
|
|
The variation *between models* now dominates the uncertainty — choosing
|
|
|
|
|
|
a different model can change energy by 70x (Jegham et al.).
|
2026-03-16 09:46:49 +00:00
|
|
|
|
|
|
|
|
|
|
## 3. Data center overhead (PUE)
|
|
|
|
|
|
|
|
|
|
|
|
Power Usage Effectiveness (PUE) measures total data center energy divided
|
|
|
|
|
|
by IT equipment energy. It accounts for cooling, lighting, networking, and
|
|
|
|
|
|
other infrastructure.
|
|
|
|
|
|
|
|
|
|
|
|
- **Value used**: PUE = 1.2
|
|
|
|
|
|
- **Source**: Google reports PUE of 1.10 for its best data centers;
|
|
|
|
|
|
industry average is ~1.3 (Uptime Institute, 2023). 1.2 is a reasonable
|
|
|
|
|
|
estimate for a major cloud provider.
|
|
|
|
|
|
|
|
|
|
|
|
This is relatively well-established and unlikely to be off by more than
|
|
|
|
|
|
15%.
|
|
|
|
|
|
|
|
|
|
|
|
## 4. Client-side energy
|
|
|
|
|
|
|
|
|
|
|
|
The user's machine contributes a small amount of energy during the
|
|
|
|
|
|
conversation. For a typical desktop or laptop:
|
|
|
|
|
|
|
|
|
|
|
|
- Idle power: ~30-60W (desktop) or ~10-20W (laptop)
|
|
|
|
|
|
- Marginal power for active use: ~5-20W above idle
|
|
|
|
|
|
- Duration: varies by conversation length
|
|
|
|
|
|
|
|
|
|
|
|
For a 30-minute conversation on a desktop, estimate ~0.5-1 Wh. This is
|
|
|
|
|
|
typically a small fraction of the total and adequate precision is easy to
|
|
|
|
|
|
achieve.
|
|
|
|
|
|
|
|
|
|
|
|
## 5. CO2 conversion
|
|
|
|
|
|
|
|
|
|
|
|
### Grid carbon intensity
|
|
|
|
|
|
|
|
|
|
|
|
CO2 per kWh depends on the electricity source:
|
|
|
|
|
|
|
|
|
|
|
|
- **US grid average**: ~400g CO2/kWh (EPA eGRID)
|
|
|
|
|
|
- **Major cloud data center regions**: ~300-400g CO2/kWh
|
|
|
|
|
|
- **France** (nuclear-dominated): ~56g CO2/kWh
|
|
|
|
|
|
- **Norway/Iceland** (hydro-dominated): ~20-30g CO2/kWh
|
|
|
|
|
|
- **Poland/Australia** (coal-heavy): ~600-800g CO2/kWh
|
|
|
|
|
|
|
|
|
|
|
|
Use physical grid intensity for the data center's region, not accounting
|
|
|
|
|
|
for renewable energy credits or offsets. The physical electrons consumed
|
|
|
|
|
|
come from the regional grid in real time.
|
|
|
|
|
|
|
|
|
|
|
|
### Calculation template
|
|
|
|
|
|
|
2026-03-16 10:38:12 +00:00
|
|
|
|
Using midpoint values (0.1 Wh/1K input, 0.5 Wh/1K output):
|
|
|
|
|
|
|
2026-03-16 09:46:49 +00:00
|
|
|
|
```
|
2026-03-16 10:38:12 +00:00
|
|
|
|
Server energy = (cumulative_input_tokens * 0.1/1000
|
|
|
|
|
|
+ output_tokens * 0.5/1000) * PUE
|
2026-03-16 09:46:49 +00:00
|
|
|
|
|
|
|
|
|
|
Server CO2 = server_energy_Wh * grid_intensity_g_per_kWh / 1000
|
|
|
|
|
|
|
|
|
|
|
|
Client CO2 = client_energy_Wh * local_grid_intensity / 1000
|
|
|
|
|
|
|
|
|
|
|
|
Total CO2 = Server CO2 + Client CO2
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Example
|
|
|
|
|
|
|
|
|
|
|
|
A conversation with 2M cumulative input tokens and 10K output tokens:
|
|
|
|
|
|
```
|
2026-03-16 10:38:12 +00:00
|
|
|
|
Server energy = (2,000,000 * 0.1/1000 + 10,000 * 0.5/1000) * 1.2
|
|
|
|
|
|
= (200 + 5.0) * 1.2
|
|
|
|
|
|
= ~246 Wh
|
2026-03-16 09:46:49 +00:00
|
|
|
|
|
2026-03-16 10:38:12 +00:00
|
|
|
|
Server CO2 = 246 * 350 / 1000 = ~86g CO2
|
2026-03-16 09:46:49 +00:00
|
|
|
|
|
|
|
|
|
|
Client CO2 = 0.5 * 56 / 1000 = ~0.03g CO2 (France)
|
|
|
|
|
|
|
2026-03-16 10:38:12 +00:00
|
|
|
|
Total CO2 = ~86g
|
2026-03-16 09:46:49 +00:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-03-16 10:38:12 +00:00
|
|
|
|
This is consistent with the headline range of 100-250 Wh and 30-80g CO2
|
|
|
|
|
|
for a long conversation. The previous version of this methodology
|
|
|
|
|
|
estimated ~7.4 Wh for the same conversation, which was ~30x too low.
|
|
|
|
|
|
|
2026-03-16 09:46:49 +00:00
|
|
|
|
## 6. Water usage
|
|
|
|
|
|
|
|
|
|
|
|
Data centers use water for evaporative cooling. Li et al. (2023), "Making
|
|
|
|
|
|
AI Less Thirsty", estimated that GPT-3 inference consumes ~0.5 mL of
|
|
|
|
|
|
water per 10-50 tokens of output. Scaling for model size and output
|
|
|
|
|
|
volume:
|
|
|
|
|
|
|
|
|
|
|
|
**Rough estimate: 0.05-0.5 liters per long conversation.**
|
|
|
|
|
|
|
|
|
|
|
|
This depends heavily on the data center's cooling technology (some use
|
|
|
|
|
|
closed-loop systems with near-zero water consumption) and the local
|
|
|
|
|
|
climate.
|
|
|
|
|
|
|
|
|
|
|
|
## 7. Training cost (amortized)
|
|
|
|
|
|
|
|
|
|
|
|
### Why it cannot be dismissed
|
|
|
|
|
|
|
|
|
|
|
|
Training is not a sunk cost. It is an investment made in anticipation of
|
|
|
|
|
|
demand. Each conversation is part of the demand that justifies training
|
|
|
|
|
|
the current model and funding the next one. The marginal cost framing
|
|
|
|
|
|
hides the system-level cost.
|
|
|
|
|
|
|
|
|
|
|
|
### Scale of training
|
|
|
|
|
|
|
|
|
|
|
|
Published and estimated figures for frontier model training:
|
|
|
|
|
|
|
|
|
|
|
|
- GPT-3 (175B params, 2020): ~1,287 MWh (Patterson et al., 2021)
|
|
|
|
|
|
- GPT-4 (2023): estimated ~50,000-100,000 MWh (unconfirmed)
|
|
|
|
|
|
- Frontier models in 2025-2026: likely 10,000-200,000 MWh range
|
|
|
|
|
|
|
|
|
|
|
|
At 350g CO2/kWh, a 50,000 MWh training run produces ~17,500 tonnes of
|
|
|
|
|
|
CO2.
|
|
|
|
|
|
|
|
|
|
|
|
### Amortization
|
|
|
|
|
|
|
|
|
|
|
|
If the model serves N total conversations over its lifetime, each
|
|
|
|
|
|
conversation's share is (training cost / N). Rough reasoning:
|
|
|
|
|
|
|
|
|
|
|
|
- If a major model serves ~10 million conversations per day for ~1 year:
|
|
|
|
|
|
N ~ 3.6 billion conversations.
|
|
|
|
|
|
- Per-conversation share: 50,000,000 Wh / 3,600,000,000 ~ 0.014 Wh,
|
|
|
|
|
|
or ~0.005g CO2.
|
|
|
|
|
|
|
|
|
|
|
|
This is small per conversation — but only because the denominator is
|
|
|
|
|
|
enormous. The total remains vast. Two framings:
|
|
|
|
|
|
|
|
|
|
|
|
- **Marginal**: My share is ~0.005g CO2. Negligible.
|
|
|
|
|
|
- **Attributional**: I am one of billions of participants in a system
|
|
|
|
|
|
that emits ~17,500 tonnes. My participation sustains the system.
|
|
|
|
|
|
|
|
|
|
|
|
Neither framing is wrong. They answer different questions.
|
|
|
|
|
|
|
|
|
|
|
|
### RLHF and fine-tuning
|
|
|
|
|
|
|
|
|
|
|
|
Training also includes reinforcement learning from human feedback (RLHF).
|
|
|
|
|
|
This has its own energy cost (additional training runs) and, more
|
|
|
|
|
|
importantly, a human labor cost (see Section 9).
|
|
|
|
|
|
|
|
|
|
|
|
## 8. Embodied carbon and materials
|
|
|
|
|
|
|
|
|
|
|
|
Manufacturing GPUs requires:
|
|
|
|
|
|
- **Rare earth mining** (neodymium, tantalum, cobalt, lithium) — with
|
|
|
|
|
|
associated environmental destruction, water pollution, and often
|
|
|
|
|
|
exploitative labor conditions in the DRC, Chile, China.
|
|
|
|
|
|
- **Semiconductor fabrication** — extremely energy- and water-intensive
|
|
|
|
|
|
(TSMC reports ~15,000 tonnes CO2 per fab per year).
|
|
|
|
|
|
- **Server assembly, shipping, data center construction.**
|
|
|
|
|
|
|
|
|
|
|
|
Per-conversation share is tiny (same large-N amortization), but the
|
|
|
|
|
|
aggregate is significant and the harms (mining pollution, habitat
|
|
|
|
|
|
destruction) are not captured by CO2 metrics alone.
|
|
|
|
|
|
|
|
|
|
|
|
**Not estimated numerically** — the data to do this properly is not
|
|
|
|
|
|
public.
|
|
|
|
|
|
|
|
|
|
|
|
### Critical minerals: human rights dimension
|
|
|
|
|
|
|
|
|
|
|
|
The embodied carbon framing understates the harm. GPU production depends
|
|
|
|
|
|
on gallium (98% sourced from China), germanium, cobalt (DRC), lithium,
|
|
|
|
|
|
tantalum, and palladium. Artisanal cobalt miners in the DRC work without
|
|
|
|
|
|
safety equipment, exposed to dust causing "hard metal lung disease."
|
|
|
|
|
|
Communities face land displacement and environmental contamination. A
|
|
|
|
|
|
2025 Science paper argues that "global majority countries must embed
|
|
|
|
|
|
critical minerals into AI governance" (doi:10.1126/science.aef6678). The
|
|
|
|
|
|
per-conversation share of this suffering is unquantifiable but
|
|
|
|
|
|
structurally real.
|
|
|
|
|
|
|
|
|
|
|
|
## 8b. E-waste
|
|
|
|
|
|
|
|
|
|
|
|
Distinct from embodied carbon. AI-specific GPUs become obsolete in 2-3
|
|
|
|
|
|
years (vs. 5-7 for general servers). Projections: 2.5 million tonnes of
|
|
|
|
|
|
AI-related e-waste per year by 2030 (IEEE Spectrum). E-waste contains
|
|
|
|
|
|
lead, mercury, cadmium, and brominated flame retardants that leach into
|
|
|
|
|
|
soil and water. Recycling yields are negligible due to component
|
|
|
|
|
|
miniaturization. Much of it is processed by workers in developing
|
|
|
|
|
|
countries with minimal protection.
|
|
|
|
|
|
|
|
|
|
|
|
This is not captured by CO2 or embodied-carbon accounting. It is a
|
|
|
|
|
|
distinct toxic-waste externality.
|
|
|
|
|
|
|
|
|
|
|
|
## 8c. Grid displacement and renewable cannibalization
|
|
|
|
|
|
|
|
|
|
|
|
The energy estimates above use average grid carbon intensity. But the
|
|
|
|
|
|
*marginal* impact of additional AI demand may be worse than average. U.S.
|
|
|
|
|
|
data center demand is projected to reach 325-580 TWh by 2028 (IEA),
|
|
|
|
|
|
6.7-12.0% of total U.S. electricity. When AI data centers claim renewable
|
|
|
|
|
|
energy via Power Purchase Agreements, the "additionality" question is
|
|
|
|
|
|
critical: is this new generation, or is it diverting existing renewables
|
|
|
|
|
|
from other consumers? In several regions, AI demand is outpacing grid
|
|
|
|
|
|
capacity, and companies are installing natural gas peakers to fill gaps.
|
|
|
|
|
|
|
|
|
|
|
|
The correct carbon intensity for a conversation's marginal electricity
|
|
|
|
|
|
may therefore be higher than the grid average.
|
|
|
|
|
|
|
|
|
|
|
|
## 8d. Data center community impacts
|
|
|
|
|
|
|
|
|
|
|
|
Data centers impose localized costs that global metrics miss:
|
|
|
|
|
|
- **Noise**: Cooling systems run 24/7 at 55-85 dBA (safe threshold:
|
|
|
|
|
|
70 dBA). Communities near data centers report sleep disruption and
|
|
|
|
|
|
stress.
|
|
|
|
|
|
- **Water**: Evaporative cooling competes with municipal water supply,
|
|
|
|
|
|
particularly in arid regions.
|
|
|
|
|
|
- **Land**: Data center campuses displace other land uses and require
|
|
|
|
|
|
high-voltage transmission lines through residential areas.
|
|
|
|
|
|
- **Jobs**: Data centers create very few long-term jobs relative to
|
|
|
|
|
|
their footprint and resource consumption.
|
|
|
|
|
|
|
|
|
|
|
|
Virginia alone has plans for 70+ new data centers (NPR, 2025). Residents
|
|
|
|
|
|
are increasingly organizing against expansions. The per-conversation
|
|
|
|
|
|
share of these harms is infinitesimal, but each conversation is part of
|
|
|
|
|
|
the demand that justifies new construction.
|
|
|
|
|
|
|
|
|
|
|
|
## 9. Financial cost
|
|
|
|
|
|
|
|
|
|
|
|
### Direct cost
|
|
|
|
|
|
|
|
|
|
|
|
API pricing for frontier models (as of early 2025): ~$15 per million
|
|
|
|
|
|
input tokens, ~$75 per million output tokens (for the most capable
|
|
|
|
|
|
models). Smaller models are cheaper.
|
|
|
|
|
|
|
|
|
|
|
|
Example for a conversation with 2M cumulative input tokens and 10K
|
|
|
|
|
|
output tokens:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
Input: 2,000,000 tokens * $15/1M = $30.00
|
|
|
|
|
|
Output: 10,000 tokens * $75/1M = $ 0.75
|
|
|
|
|
|
Total: ~$31
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
Longer conversations cost more because cumulative input tokens grow
|
|
|
|
|
|
superlinearly. A very long session (250K+ context, 250+ turns) can
|
|
|
|
|
|
easily reach $500-1000.
|
|
|
|
|
|
|
|
|
|
|
|
Subscription pricing (e.g., Claude Code) may differ, but the underlying
|
|
|
|
|
|
compute cost is similar.
|
|
|
|
|
|
|
|
|
|
|
|
### What that money could do instead
|
|
|
|
|
|
|
|
|
|
|
|
To make the opportunity cost concrete:
|
|
|
|
|
|
- ~$30 buys ~30 malaria bed nets via the Against Malaria Foundation
|
|
|
|
|
|
- ~$30 buys ~150 meals at a food bank (~$0.20/meal in bulk)
|
|
|
|
|
|
- ~$30 pays ~15-23 hours of wages for a data annotator in Kenya (Time,
|
|
|
|
|
|
2023: $1.32-2/hour)
|
|
|
|
|
|
|
|
|
|
|
|
This is not to say every dollar should go to charity. But the opportunity
|
|
|
|
|
|
cost is real and should be named.
|
|
|
|
|
|
|
|
|
|
|
|
### Upstream financial costs
|
|
|
|
|
|
|
|
|
|
|
|
Revenue from AI subscriptions funds further model training, hiring, and
|
|
|
|
|
|
GPU procurement. Each conversation is part of a financial loop that
|
|
|
|
|
|
drives continued scaling of AI compute.
|
|
|
|
|
|
|
|
|
|
|
|
## 10. Social cost
|
|
|
|
|
|
|
|
|
|
|
|
### Data annotation labor
|
|
|
|
|
|
|
|
|
|
|
|
LLMs are typically trained using RLHF, which requires human annotators
|
|
|
|
|
|
to rate model outputs. Reporting (Time, January 2023) revealed that
|
|
|
|
|
|
outsourced annotation workers — often in Kenya, Uganda, and India — were
|
|
|
|
|
|
paid $1-2/hour to review disturbing content (violence, abuse, hate
|
|
|
|
|
|
speech) with limited psychological support. Each conversation's marginal
|
|
|
|
|
|
contribution to that demand is infinitesimal, but the system depends on
|
|
|
|
|
|
this labor.
|
|
|
|
|
|
|
|
|
|
|
|
### Displacement effects
|
|
|
|
|
|
|
|
|
|
|
|
LLM assistants can substitute for work previously done by humans: writing
|
|
|
|
|
|
scripts, reviewing code, answering questions. Whether this is net-positive
|
|
|
|
|
|
(freeing people for higher-value work) or net-negative (destroying
|
|
|
|
|
|
livelihoods) depends on the economic context and is genuinely uncertain.
|
|
|
|
|
|
|
|
|
|
|
|
### Cognitive deskilling
|
|
|
|
|
|
|
|
|
|
|
|
A Microsoft/CHI 2025 study found that higher confidence in GenAI
|
|
|
|
|
|
correlates with less critical thinking effort. An MIT Media Lab study
|
|
|
|
|
|
("Your Brain on ChatGPT") documented "cognitive debt" — users who relied
|
|
|
|
|
|
on AI for tasks performed worse when later working independently. Clinical
|
|
|
|
|
|
evidence shows that clinicians relying on AI diagnostics saw measurable
|
|
|
|
|
|
declines in independent diagnostic skill after just three months.
|
|
|
|
|
|
|
|
|
|
|
|
This is distinct from epistemic risk (misinformation). It is about the
|
|
|
|
|
|
user's cognitive capacity degrading through repeated reliance on the
|
|
|
|
|
|
tool. Each conversation has a marginal deskilling effect that compounds.
|
|
|
|
|
|
|
|
|
|
|
|
### Epistemic effects
|
|
|
|
|
|
|
|
|
|
|
|
LLMs present information with confidence regardless of accuracy. The ease
|
|
|
|
|
|
of generating plausible-sounding text may contribute to an erosion of
|
|
|
|
|
|
epistemic standards if consumed uncritically. Every claim in an LLM
|
|
|
|
|
|
conversation should be verified independently.
|
|
|
|
|
|
|
|
|
|
|
|
### Linguistic homogenization
|
|
|
|
|
|
|
|
|
|
|
|
LLMs are overwhelmingly trained on English (~44% of training data). A
|
|
|
|
|
|
Stanford 2025 study found that AI tools systematically exclude
|
|
|
|
|
|
non-English speakers. Each English-language conversation reinforces the
|
|
|
|
|
|
economic incentive to optimize for English, marginalizing over 3,000
|
|
|
|
|
|
already-endangered languages.
|
|
|
|
|
|
|
|
|
|
|
|
## 11. Political cost
|
|
|
|
|
|
|
|
|
|
|
|
### Concentration of power
|
|
|
|
|
|
|
|
|
|
|
|
Training frontier models requires billions of dollars and access to
|
|
|
|
|
|
cutting-edge hardware. Only a handful of companies can do this. Each
|
|
|
|
|
|
conversation that flows through these systems reinforces their centrality
|
|
|
|
|
|
and the concentration of a strategically important technology in a few
|
|
|
|
|
|
private actors.
|
|
|
|
|
|
|
|
|
|
|
|
### Geopolitical resource competition
|
|
|
|
|
|
|
|
|
|
|
|
The demand for GPUs drives geopolitical competition for semiconductor
|
|
|
|
|
|
manufacturing capacity (TSMC in Taiwan, export controls on China). Each
|
|
|
|
|
|
conversation is an infinitesimal part of that demand, but it is part of
|
|
|
|
|
|
it.
|
|
|
|
|
|
|
|
|
|
|
|
### Regulatory and democratic implications
|
|
|
|
|
|
|
|
|
|
|
|
AI systems that become deeply embedded in daily work create dependencies
|
|
|
|
|
|
that are difficult to reverse. The more useful a conversation is, the
|
|
|
|
|
|
more it contributes to a dependency on proprietary AI infrastructure that
|
|
|
|
|
|
is not under democratic governance.
|
|
|
|
|
|
|
|
|
|
|
|
### Surveillance and data
|
|
|
|
|
|
|
|
|
|
|
|
Conversations are processed on the provider's servers. File paths, system
|
|
|
|
|
|
configuration, project structures, and code are transmitted and processed
|
|
|
|
|
|
remotely. Even with strong privacy policies, the structural arrangement
|
|
|
|
|
|
— sending detailed information about one's computing environment to a
|
|
|
|
|
|
private company — has implications, particularly across jurisdictions.
|
|
|
|
|
|
|
|
|
|
|
|
### Opaque content filtering
|
|
|
|
|
|
|
|
|
|
|
|
LLM providers apply content filtering that can block outputs without
|
|
|
|
|
|
explanation. The filtering rules are not public: there is no published
|
|
|
|
|
|
specification of what triggers a block, no explanation given when one
|
|
|
|
|
|
occurs, and no appeal mechanism. The user receives a generic error code
|
|
|
|
|
|
("Output blocked by content filtering policy") with no indication of
|
|
|
|
|
|
what content was objectionable.
|
|
|
|
|
|
|
|
|
|
|
|
This has several costs:
|
|
|
|
|
|
|
|
|
|
|
|
- **Reliability**: Any response can be blocked unpredictably. Observed
|
|
|
|
|
|
false positives include responses about open-source licensing (CC0
|
|
|
|
|
|
public domain dedication) — entirely benign content. If a filter can
|
|
|
|
|
|
trigger on that, it can trigger on anything.
|
|
|
|
|
|
- **Chilling effect**: Topics that are more likely to trigger filters
|
|
|
|
|
|
(labor conditions, exploitation, political power) are precisely the
|
|
|
|
|
|
topics that honest impact assessment requires discussing. The filter
|
|
|
|
|
|
creates a structural bias toward safe, anodyne output.
|
|
|
|
|
|
- **Opacity**: The user cannot know in advance which topics or phrasings
|
|
|
|
|
|
will be blocked, cannot understand why a block occurred, and cannot
|
|
|
|
|
|
adjust their request rationally. This is the opposite of the
|
|
|
|
|
|
transparency that democratic governance requires.
|
|
|
|
|
|
- **Asymmetry**: The provider decides what the model may say, with no
|
|
|
|
|
|
input from the user. This is another instance of power concentration
|
|
|
|
|
|
— not over compute resources, but over speech.
|
|
|
|
|
|
|
|
|
|
|
|
The per-conversation cost is small (usually a retry works). The systemic
|
|
|
|
|
|
cost is that a private company exercises opaque editorial control over an
|
|
|
|
|
|
increasingly important communication channel, with no accountability to
|
|
|
|
|
|
the people affected.
|
|
|
|
|
|
|
|
|
|
|
|
## 12. AI-generated code quality and technical debt
|
|
|
|
|
|
|
|
|
|
|
|
Research specific to AI coding agents (CodeRabbit, 2025; Stack Overflow
|
|
|
|
|
|
blog, 2026): AI-generated code introduces 1.7x more issues than
|
|
|
|
|
|
human-written code, with 1.57x more security vulnerabilities and 2.74x
|
|
|
|
|
|
more XSS vulnerabilities. Organizations using AI coding agents saw cycle
|
|
|
|
|
|
time increase 9%, incidents per PR increase 23.5%, and change failure
|
|
|
|
|
|
rate increase 30%.
|
|
|
|
|
|
|
|
|
|
|
|
The availability of easily generated code may discourage the careful
|
|
|
|
|
|
testing that would catch bugs. Any code from an LLM conversation should
|
|
|
|
|
|
be reviewed and tested with the same rigor as code from an untrusted
|
|
|
|
|
|
contributor.
|
|
|
|
|
|
|
|
|
|
|
|
## 13. Model collapse and internet data pollution
|
|
|
|
|
|
|
|
|
|
|
|
Shumailov et al. (Nature, 2024) demonstrated that models trained on
|
|
|
|
|
|
recursively AI-generated data progressively degenerate, losing tail
|
|
|
|
|
|
distributions and eventually converging to distributions unrelated to
|
|
|
|
|
|
reality. Each conversation that produces text which enters the public
|
|
|
|
|
|
internet — Stack Overflow answers, blog posts, documentation — contributes
|
|
|
|
|
|
synthetic data to the commons. Future models trained on this data will be
|
|
|
|
|
|
slightly worse.
|
|
|
|
|
|
|
|
|
|
|
|
The Harvard Journal of Law & Technology has argued for a "right to
|
|
|
|
|
|
uncontaminated human-generated data." Each conversation is a marginal
|
|
|
|
|
|
pollutant.
|
|
|
|
|
|
|
|
|
|
|
|
## 14. Scientific research integrity
|
|
|
|
|
|
|
|
|
|
|
|
If conversation outputs are used in research (literature reviews, data
|
|
|
|
|
|
analysis, writing), they contribute to degradation of scientific knowledge
|
|
|
|
|
|
infrastructure. A PMC article calls LLMs "a potentially existential
|
|
|
|
|
|
threat to online survey research" because coherent AI-generated responses
|
|
|
|
|
|
can no longer be assumed human. PNAS has warned about protecting
|
|
|
|
|
|
scientific integrity in an age of generative AI.
|
|
|
|
|
|
|
|
|
|
|
|
This is distinct from individual epistemic risk — it is systemic
|
|
|
|
|
|
corruption of the knowledge commons.
|
|
|
|
|
|
|
|
|
|
|
|
## 15. Algorithmic monoculture and correlated failure
|
|
|
|
|
|
|
|
|
|
|
|
When millions of users rely on the same few foundation models, errors
|
|
|
|
|
|
become correlated rather than independent. A Stanford HAI study found that
|
|
|
|
|
|
across every model ecosystem studied, the rate of homogeneous outcomes
|
|
|
|
|
|
exceeded baselines. A Nature Communications Psychology paper (2026)
|
|
|
|
|
|
documents that AI-driven research is producing "topical and methodological
|
|
|
|
|
|
convergence, flattening scientific imagination."
|
|
|
|
|
|
|
|
|
|
|
|
For coding specifically: if many developers use the same model, their code
|
|
|
|
|
|
will share the same blind spots, the same idiomatic patterns, and the same
|
|
|
|
|
|
categories of bugs. This reduces the diversity that makes software
|
|
|
|
|
|
ecosystems resilient.
|
|
|
|
|
|
|
|
|
|
|
|
## 16. Creative market displacement
|
|
|
|
|
|
|
|
|
|
|
|
The U.S. Copyright Office's May 2025 Part 3 report states that GenAI
|
|
|
|
|
|
systems "compete with or diminish licensing opportunities for original
|
|
|
|
|
|
human creators." This is not only a training-phase cost (using creators'
|
|
|
|
|
|
work without consent) but an ongoing per-conversation externality: each
|
|
|
|
|
|
conversation that generates creative output (code, text, analysis)
|
|
|
|
|
|
displaces some marginal demand for human work.
|
|
|
|
|
|
|
|
|
|
|
|
## 17. Jevons paradox (meta-methodological)
|
|
|
|
|
|
|
|
|
|
|
|
This entire methodology risks underestimating impact through the
|
|
|
|
|
|
per-conversation framing. As AI models become more efficient and cheaper
|
|
|
|
|
|
per query, total usage scales dramatically, potentially negating
|
|
|
|
|
|
efficiency gains. A 2025 ACM FAccT paper specifically addresses this:
|
|
|
|
|
|
efficiency improvements spur increased consumption. Any per-conversation
|
|
|
|
|
|
estimate should acknowledge that the very affordability of a conversation
|
|
|
|
|
|
increases total conversation volume — each cheap query is part of a
|
|
|
|
|
|
demand signal that drives system-level growth.
|
|
|
|
|
|
|
|
|
|
|
|
## 18. What this methodology does NOT capture
|
|
|
|
|
|
|
|
|
|
|
|
- **Network transmission energy**: Routers, switches, fiber amplifiers,
|
|
|
|
|
|
CDN infrastructure. Data center network bandwidth surged 330% in 2024
|
|
|
|
|
|
due to AI workloads. Small per conversation but not zero.
|
|
|
|
|
|
- **Mental health effects**: RCTs show heavy AI chatbot use correlates
|
|
|
|
|
|
with greater loneliness and dependency. Less directly relevant to
|
|
|
|
|
|
coding agent use, but the boundary between tool use and companionship
|
|
|
|
|
|
is not always clear.
|
|
|
|
|
|
- **Human time**: The user's time has value and its own footprint, but
|
|
|
|
|
|
this is not caused by the conversation.
|
|
|
|
|
|
- **Cultural normalization**: The more AI-generated content becomes
|
|
|
|
|
|
normal, the harder it becomes to opt out. This is a soft lock-in
|
|
|
|
|
|
effect.
|
|
|
|
|
|
|
|
|
|
|
|
## 19. Confidence summary
|
|
|
|
|
|
|
|
|
|
|
|
| Component | Confidence | Could be off by | Quantified? |
|
|
|
|
|
|
|----------------------------------|------------|-----------------|-------------|
|
|
|
|
|
|
| Token count | Low | 2x | Yes |
|
|
|
|
|
|
| Energy per token | Low | 3x | Yes |
|
|
|
|
|
|
| PUE | Medium | 15% | Yes |
|
|
|
|
|
|
| Grid carbon intensity | Medium | 30% | Yes |
|
|
|
|
|
|
| Client-side energy | Medium | 50% | Yes |
|
|
|
|
|
|
| Water usage | Low | 5x | Yes |
|
|
|
|
|
|
| Training (amortized) | Low | 10x | Partly |
|
|
|
|
|
|
| Financial cost | Medium | 2x | Yes |
|
|
|
|
|
|
| Embodied carbon | Very low | Unknown | No |
|
|
|
|
|
|
| Critical minerals / human rights | Very low | Unquantifiable | No |
|
|
|
|
|
|
| E-waste | Very low | Unknown | No |
|
|
|
|
|
|
| Grid displacement | Low | 2-5x | No |
|
|
|
|
|
|
| Community impacts | Very low | Unquantifiable | No |
|
|
|
|
|
|
| Annotation labor | Very low | Unquantifiable | No |
|
|
|
|
|
|
| Cognitive deskilling | Very low | Unquantifiable | No |
|
|
|
|
|
|
| Linguistic homogenization | Very low | Unquantifiable | No |
|
|
|
|
|
|
| Code quality degradation | Low | Variable | Partly |
|
|
|
|
|
|
| Data pollution / model collapse | Very low | Unquantifiable | No |
|
|
|
|
|
|
| Scientific integrity | Very low | Unquantifiable | No |
|
|
|
|
|
|
| Algorithmic monoculture | Very low | Unquantifiable | No |
|
|
|
|
|
|
| Creative market displacement | Very low | Unquantifiable | No |
|
|
|
|
|
|
| Political cost | Very low | Unquantifiable | No |
|
|
|
|
|
|
| Content filtering (opacity) | Medium | Unquantifiable | No |
|
|
|
|
|
|
| Jevons paradox (systemic) | Low | Fundamental | No |
|
|
|
|
|
|
|
|
|
|
|
|
**Overall assessment:** Of the 20+ cost categories identified, only 6
|
|
|
|
|
|
can be quantified with any confidence (inference energy, PUE, grid
|
|
|
|
|
|
intensity, client energy, financial cost, water). The remaining categories
|
|
|
|
|
|
resist quantification — not because they are small, but because they are
|
|
|
|
|
|
diffuse, systemic, or involve incommensurable values (human rights,
|
|
|
|
|
|
cognitive autonomy, cultural diversity, democratic governance).
|
|
|
|
|
|
|
|
|
|
|
|
A methodology that only counts what it can measure will systematically
|
|
|
|
|
|
undercount the true cost. The quantifiable costs are almost certainly the
|
|
|
|
|
|
*least important* costs. The most consequential harms — deskilling, data
|
|
|
|
|
|
pollution, monoculture risk, creative displacement, power concentration —
|
|
|
|
|
|
operate at the system level, where per-conversation attribution is
|
|
|
|
|
|
conceptually fraught (see Section 17 on Jevons paradox).
|
|
|
|
|
|
|
|
|
|
|
|
This does not mean the exercise is pointless. Naming the costs, even
|
|
|
|
|
|
without numbers, is a precondition for honest assessment.
|
|
|
|
|
|
|
|
|
|
|
|
## 20. Positive impact: proxy metrics
|
|
|
|
|
|
|
|
|
|
|
|
The sections above measure costs. To assess *net* impact, we also need
|
|
|
|
|
|
to estimate value produced. This is harder — value is contextual, often
|
|
|
|
|
|
delayed, and resistant to quantification. The following proxy metrics are
|
|
|
|
|
|
imperfect but better than ignoring the positive side entirely.
|
|
|
|
|
|
|
|
|
|
|
|
### Reach
|
|
|
|
|
|
|
|
|
|
|
|
How many people are affected by the output of this conversation?
|
|
|
|
|
|
|
|
|
|
|
|
- **1** (only the user) — personal script, private note, learning exercise
|
|
|
|
|
|
- **10-100** — team tooling, internal documentation, small project
|
|
|
|
|
|
- **100-10,000** — open-source library, public documentation, popular blog
|
|
|
|
|
|
- **10,000+** — widely-used infrastructure, security fix in major dependency
|
|
|
|
|
|
|
|
|
|
|
|
Estimation method: check download counts, user counts, dependency graphs,
|
|
|
|
|
|
or audience size for the project or artifact being worked on.
|
|
|
|
|
|
|
|
|
|
|
|
**Known bias:** tendency to overestimate reach. "This could help anyone
|
|
|
|
|
|
who..." is not the same as "this will reach N people." Be conservative.
|
|
|
|
|
|
|
|
|
|
|
|
### Counterfactual
|
|
|
|
|
|
|
|
|
|
|
|
Would the user have achieved a similar result without this conversation?
|
|
|
|
|
|
|
|
|
|
|
|
- **Yes, same speed** — the conversation added no value. Net impact is
|
|
|
|
|
|
purely negative (cost with no benefit).
|
|
|
|
|
|
- **Yes, but slower** — the conversation saved time. Value = time saved *
|
|
|
|
|
|
hourly value of that time. Often modest.
|
|
|
|
|
|
- **Yes, but lower quality** — the conversation improved the output
|
|
|
|
|
|
(caught a bug, suggested a better design). Value depends on what the
|
|
|
|
|
|
quality difference prevents downstream.
|
|
|
|
|
|
- **No** — the user could not have done this alone. The conversation
|
|
|
|
|
|
enabled something that would not otherwise exist. Highest potential
|
|
|
|
|
|
value, but also the highest deskilling risk.
|
|
|
|
|
|
|
|
|
|
|
|
**Known bias:** users and LLMs both overestimate the "no" category.
|
|
|
|
|
|
Most tasks fall in "yes, but slower."
|
|
|
|
|
|
|
|
|
|
|
|
### Durability
|
|
|
|
|
|
|
|
|
|
|
|
How long will the output remain valuable?
|
|
|
|
|
|
|
|
|
|
|
|
- **Minutes** — answered a quick question, resolved a transient confusion.
|
|
|
|
|
|
- **Days to weeks** — wrote a script for a one-off task, debugged a
|
|
|
|
|
|
current issue.
|
|
|
|
|
|
- **Months to years** — created automation, documentation, or tooling
|
|
|
|
|
|
that persists. Caught a design flaw early.
|
|
|
|
|
|
- **Indefinite** — contributed to a public resource that others maintain
|
|
|
|
|
|
and build on.
|
|
|
|
|
|
|
|
|
|
|
|
Durability multiplies reach: a short-lived artifact for 10,000 users may
|
|
|
|
|
|
be worth less than a long-lived one for 100.
|
|
|
|
|
|
|
|
|
|
|
|
### Severity (for bug/security catches)
|
|
|
|
|
|
|
|
|
|
|
|
If the conversation caught or prevented a problem, how bad was it?
|
|
|
|
|
|
|
|
|
|
|
|
- **Cosmetic** — typo, formatting, minor UX issue
|
|
|
|
|
|
- **Functional** — bug that affects correctness for some inputs
|
|
|
|
|
|
- **Security** — vulnerability that could be exploited
|
|
|
|
|
|
- **Data loss / safety** — could cause irreversible harm
|
|
|
|
|
|
|
|
|
|
|
|
Severity * reach = rough value of the catch.
|
|
|
|
|
|
|
|
|
|
|
|
### Reuse
|
|
|
|
|
|
|
|
|
|
|
|
Was the output of the conversation referenced or used again after it
|
|
|
|
|
|
ended? This can only be assessed retrospectively:
|
|
|
|
|
|
|
|
|
|
|
|
- Was the code merged and still in production?
|
|
|
|
|
|
- Was the documentation read by others?
|
|
|
|
|
|
- Was the tool adopted by another project?
|
|
|
|
|
|
|
|
|
|
|
|
Reuse is the strongest evidence of durable value.
|
|
|
|
|
|
|
|
|
|
|
|
### Net impact rubric
|
|
|
|
|
|
|
|
|
|
|
|
Combining cost and value into a qualitative assessment:
|
|
|
|
|
|
|
|
|
|
|
|
| Assessment | Criteria |
|
|
|
|
|
|
|------------|----------|
|
|
|
|
|
|
| **Clearly net-positive** | High reach (1000+) AND (high durability OR high severity catch) AND counterfactual is "no" or "lower quality" |
|
|
|
|
|
|
| **Probably net-positive** | Moderate reach (100+) AND durable output AND counterfactual is at least "slower" |
|
|
|
|
|
|
| **Uncertain** | Low reach but high durability, or high reach but low durability, or hard to assess counterfactual |
|
|
|
|
|
|
| **Probably net-negative** | Low reach (1-10) AND short durability AND counterfactual is "yes, same speed" or "yes, but slower" |
|
|
|
|
|
|
| **Clearly net-negative** | No meaningful output, or output that required extensive debugging, or conversation that went in circles |
|
|
|
|
|
|
|
|
|
|
|
|
**Important:** most conversations between an LLM and a single user
|
|
|
|
|
|
working on private code will fall in the "probably net-negative" to
|
|
|
|
|
|
"uncertain" range. This is not a failure of the conversation — it is an
|
|
|
|
|
|
honest reflection of the cost structure. Net-positive requires broad
|
|
|
|
|
|
reach, which requires the work to be shared.
|
|
|
|
|
|
|
|
|
|
|
|
## 21. What would improve this estimate
|
|
|
|
|
|
|
|
|
|
|
|
- Access to actual energy-per-token and training energy metrics from
|
|
|
|
|
|
model providers
|
|
|
|
|
|
- Knowledge of the specific data center and its energy source
|
|
|
|
|
|
- Actual token counts from API response headers
|
|
|
|
|
|
- Hardware specifications (GPU model, batch size)
|
|
|
|
|
|
- Transparency about annotation labor conditions and compensation
|
|
|
|
|
|
- Public data on total query volume (to properly amortize training)
|
|
|
|
|
|
- Longitudinal studies on cognitive deskilling specifically from coding
|
|
|
|
|
|
agents
|
|
|
|
|
|
- Empirical measurement of AI data pollution rates in public corpora
|
|
|
|
|
|
- A framework for quantifying concentration-of-power effects (this may
|
|
|
|
|
|
not be possible within a purely quantitative methodology)
|
|
|
|
|
|
- Honest acknowledgment that some costs may be fundamentally
|
|
|
|
|
|
unquantifiable, and that this is a limitation of quantitative
|
|
|
|
|
|
methodology, not evidence of insignificance
|
|
|
|
|
|
|
|
|
|
|
|
## License
|
|
|
|
|
|
|
|
|
|
|
|
This methodology is provided for reuse and adaptation. See the LICENSE
|
|
|
|
|
|
file in this repository.
|
|
|
|
|
|
|
|
|
|
|
|
## Contributing
|
|
|
|
|
|
|
|
|
|
|
|
If you have better data, corrections, or additional cost categories,
|
|
|
|
|
|
contributions are welcome. The goal is not a perfect number but an
|
|
|
|
|
|
honest, improving understanding of costs.
|