ai-conversation-impact/plans/quantify-social-costs.md

# Plan: Quantify social, epistemic, and political costs

**Target sub-goals**: 2 (measure impact), 6 (improve methodology)

## Problem

Our differentiator is the taxonomy of non-environmental costs (social,
epistemic, political). But the toolkit only tracks environmental and
financial metrics. Until we can produce numbers — even rough proxies —
for the other dimensions, the taxonomy remains a document, not a tool.

The confidence summary (Section 19) marks 13 of 22 categories as
"Unquantifiable." Some genuinely resist quantification. But for others,
we can define measurable proxies that capture *something* meaningful per
conversation, even if imperfect.

## Design principle

Not everything needs a number. The goal is to move categories from
"unquantifiable" to "rough proxy available" where honest proxies exist,
and to explicitly mark categories where quantification would be
dishonest. A bad number is worse than no number.

## Category-by-category analysis

### Feasible: per-conversation proxies exist

#### 1. Cognitive deskilling (Section 10)

**Proxy: Automation ratio**
- Measure: fraction of conversation output tokens vs. user input tokens.
  A conversation where the AI writes 95% of the code has a higher
  deskilling risk than one where the user writes code and asks for review.
- Formula: `deskilling_risk = output_tokens / (output_tokens + user_tokens)`
- Range: 0 (pure teaching) to 1 (pure delegation)
- Can be computed from the transcript by the existing hook.
- Calibration: weight by task type if detectable (e.g., "explain" vs
  "write" vs "fix").

**Proxy: Review signal**
- Did the user modify the AI's output before committing? If the hook can
  detect git diffs between AI-generated code and committed code, the
  delta indicates human review effort. High delta = more engagement =
  less deskilling risk.
- Requires: post-commit hook comparing AI output to committed diff.

**Confidence: Low but measurable.** The ratio is crude — a user who
delegates wisely is not deskilling. But it's directionally useful.

#### 2. Code quality degradation (Section 12)

**Proxy: Defect signal**
- Track whether tests pass after AI-generated changes. The hook could
  record: (a) did the conversation include test runs? (b) did tests fail
  after AI changes? (c) how many retry cycles occurred?
- Formula: `quality_risk = failed_test_runs / total_test_runs`
- Can be extracted from tool-call results in the transcript.

**Proxy: Churn rate**
- How many times was the same file edited in the conversation? High
  churn = the AI got it wrong repeatedly.
- Formula: `churn = total_file_edits / unique_files_edited`

**Confidence: Medium.** Test failures are a real signal. Churn is
noisier (some tasks legitimately require iterative editing).

#### 3. Data pollution risk (Section 13)

**Proxy: Publication exposure**
- Is the output likely to enter public corpora? Detect if the
  conversation involves: git push to a public repo, writing
  documentation, creating blog posts, Stack Overflow answers.
- Formula: binary flag `public_output = true/false`, or estimate
  `pollution_tokens = output_tokens_in_public_artifacts`
- Can be detected from tool calls (git push, file writes to known
  public paths).

**Confidence: Low.** Many paths to publication are undetectable. But
flagging known public pushes is better than nothing.

#### 4. Monoculture risk (Section 15)

**Proxy: Provider concentration**
- Log which model and provider was used. Over time, the impact log
  builds a picture of single-provider dependency.
- Formula: `monoculture_index = sessions_with_dominant_provider / total_sessions`
- Per-session: just log the model ID. The aggregate metric is computed
  across sessions.

**Confidence: Medium.** Simple to measure, meaningful at portfolio level.

#### 5. Annotation labor (Section 10)

**Proxy: Token-proportional RLHF demand**
- Each conversation generates training signal (thumbs up/down, edits,
  preference data). More tokens = more potential training data = more
  annotation demand.
- Formula: `rlhf_demand_proxy = output_tokens * annotation_rate`
  where `annotation_rate` is estimated from published RLHF dataset
  sizes vs. total conversation volume.
- Very rough. But it makes the connection between "my conversation" and
  "someone rates this output" concrete.

**Confidence: Very low.** The annotation_rate is unknown. But even an
order-of-magnitude estimate names the cost.

#### 6. Creative displacement (Section 16)

**Proxy: Substitution type**
- Classify the conversation by what human role it substitutes: code
  writing, code review, documentation, research, design.
- Formula: categorical label, not a number. But the label enables
  aggregation: "60% of my AI usage substitutes for junior developer
  work."
- Can be inferred from tool calls (Write/Edit = code writing, Grep/Read
  = research, etc.).

**Confidence: Low.** Classification is fuzzy. But naming what was
displaced is better than ignoring it.

#### 7. Power concentration (Section 11)

**Proxy: Spend concentration**
- Financial cost already tracked. Aggregating by provider shows how
  much money flows to each company.
- Formula: `provider_share = spend_with_provider / total_ai_spend`
- Trivial to compute from existing data. The interpretation is what
  matters: "I sent $X to Anthropic this month."

**Confidence: High for the number, low for what it means.**

#### 8. Content filtering opacity (Section 11)

**Proxy: Block count**
- Count how many responses were blocked by content filtering during
  the conversation.
- Formula: `filter_blocks = count(blocked_responses)`
- Can be detected from error messages in the transcript.

**Confidence: High.** Easy to measure. Interpretation is subjective.

### Infeasible: honest quantification not possible per-conversation

#### 9. Linguistic homogenization (Section 10)
- Could log conversation language, but the per-conversation contribution
  to language endangerment is genuinely unattributable. A counter
  ("this conversation was in English") is factual but not a meaningful
  cost metric. **Keep qualitative.**

#### 10. Geopolitical resource competition (Section 11)
- No per-conversation proxy exists. The connection between one API call
  and semiconductor export controls is real but too diffuse to measure.
  **Keep qualitative.**

#### 11. Mental health effects (Section 18)
- Would require user self-report. No passive measurement is honest.
  **Keep qualitative unless user opts into self-assessment.**

#### 12. Scientific integrity contamination (Section 14)
- Overlaps with data pollution (proxy #3 above). The additional risk
  (AI in research methodology) is context-dependent and cannot be
  detected from the conversation alone. **Keep qualitative.**

## Implementation plan

### Phase 1: Low-hanging fruit (extend existing hook)

Modify `pre-compact-snapshot.sh` to extract from the transcript:

1. **Automation ratio**: output_tokens / (output_tokens + user_input_tokens)
2. **Model ID**: already available from API metadata
3. **Test pass/fail counts**: parse tool call results for test outcomes
4. **File churn**: count Edit/Write tool calls per unique file
5. **Public push flag**: detect `git push` in tool calls

Add these fields to the JSONL log alongside existing metrics.

Estimated effort: extend the existing Python/bash parsing, ~100 lines.

### Phase 2: Post-conversation signals

Add an optional post-commit hook:

6. **Review delta**: compare AI-generated code (from transcript) with
   actual committed code. Measures human review effort.

Estimated effort: new hook, ~50 lines. Requires git integration.

### Phase 3: Aggregate metrics

Build a dashboard script (extend `show-impact.sh`) that computes
portfolio-level metrics across sessions:

7. **Monoculture index**: provider concentration over time
8. **Spend concentration**: cumulative $ per provider
9. **Displacement profile**: % of sessions by substitution type
10. **RLHF demand estimate**: cumulative annotation labor proxy

### Phase 4: Methodology update

Update `impact-methodology.md` Section 19 confidence summary:
- Move categories with proxies from "Unquantifiable" to "Proxy available"
- Document each proxy's limitations honestly
- Update the toolkit README to reflect new capabilities

Update `impact-toolkit/README.md` to accurately describe what the
toolkit measures.

## What this does NOT do

- It does not make the unquantifiable quantifiable. Some costs remain
  qualitative by design.
- It does not produce a single "social cost score." Collapsing
  incommensurable harms into one number would be dishonest.
- It does not claim precision. Every proxy is explicitly labeled with
  its confidence and failure modes.

## Success criteria

- The toolkit reports at least 5 non-environmental metrics per session.
- Each metric has documented limitations in the methodology.
- The confidence summary has fewer "Unquantifiable" entries.
- No metric is misleading — a proxy that doesn't work is removed, not
  kept for show.

## Risks

- **Goodhart's law**: Once measured, users may optimize for the metric
  rather than the underlying cost (e.g., adding fake user tokens to
  lower automation ratio). Mitigate by documenting that proxies are
  indicators, not targets.
- **False precision**: Numbers create an illusion of understanding.
  Mitigate by always showing confidence levels alongside values.
- **Scope creep**: Trying to measure everything dilutes the toolkit's
  usability. Start with Phase 1 only, evaluate before proceeding.