Add social cost proxies to impact tracking hooks

Extend pre-compact-snapshot.sh to extract 5 new per-conversation metrics from the transcript: automation ratio (deskilling proxy), model ID (monoculture tracking), test pass/fail counts (code quality proxy), file churn (edits per unique file), and public push detection (data pollution risk flag). Update show-impact.sh to display them. New plan: quantify-social-costs.md — roadmap for moving non-environmental cost categories from qualitative to proxy-measurable. Tasks 19-24 done. Task 25 (methodology update) pending. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 15:05:53 +00:00 · 2026-03-16 15:05:53 +00:00 · af6062c1f9
commit af6062c1f9
parent e6e0bf4616
8 changed files with 554 additions and 25 deletions
--- a/plans/README.md
+++ b/plans/README.md
@ -23,6 +23,7 @@ broad, lasting value.
 | [audience-analysis](audience-analysis.md) | 7 | New — pre-launch |
 | [measure-project-impact](measure-project-impact.md) | 2, 12 | New — pre-launch |
 | [anticipated-criticisms](anticipated-criticisms.md) | 4, 12 | New — pre-launch |
+| [quantify-social-costs](quantify-social-costs.md) | 2, 6 | New — roadmap |

 *Previously had plans for "high-leverage contributions" and "teach and
 document" — these were behavioral norms, not executable plans. Their
--- a/plans/quantify-social-costs.md
+++ b/plans/quantify-social-costs.md
@ -0,0 +1,235 @@
+# Plan: Quantify social, epistemic, and political costs
+
+**Target sub-goals**: 2 (measure impact), 6 (improve methodology)
+
+## Problem
+
+Our differentiator is the taxonomy of non-environmental costs (social,
+epistemic, political). But the toolkit only tracks environmental and
+financial metrics. Until we can produce numbers — even rough proxies —
+for the other dimensions, the taxonomy remains a document, not a tool.
+
+The confidence summary (Section 19) marks 13 of 22 categories as
+"Unquantifiable." Some genuinely resist quantification. But for others,
+we can define measurable proxies that capture *something* meaningful per
+conversation, even if imperfect.
+
+## Design principle
+
+Not everything needs a number. The goal is to move categories from
+"unquantifiable" to "rough proxy available" where honest proxies exist,
+and to explicitly mark categories where quantification would be
+dishonest. A bad number is worse than no number.
+
+## Category-by-category analysis
+
+### Feasible: per-conversation proxies exist
+
+#### 1. Cognitive deskilling (Section 10)
+
+**Proxy: Automation ratio**
+- Measure: fraction of conversation output tokens vs. user input tokens.
+  A conversation where the AI writes 95% of the code has a higher
+  deskilling risk than one where the user writes code and asks for review.
+- Formula: `deskilling_risk = output_tokens / (output_tokens + user_tokens)`
+- Range: 0 (pure teaching) to 1 (pure delegation)
+- Can be computed from the transcript by the existing hook.
+- Calibration: weight by task type if detectable (e.g., "explain" vs
+  "write" vs "fix").
+
+**Proxy: Review signal**
+- Did the user modify the AI's output before committing? If the hook can
+  detect git diffs between AI-generated code and committed code, the
+  delta indicates human review effort. High delta = more engagement =
+  less deskilling risk.
+- Requires: post-commit hook comparing AI output to committed diff.
+
+**Confidence: Low but measurable.** The ratio is crude — a user who
+delegates wisely is not deskilling. But it's directionally useful.
+
+#### 2. Code quality degradation (Section 12)
+
+**Proxy: Defect signal**
+- Track whether tests pass after AI-generated changes. The hook could
+  record: (a) did the conversation include test runs? (b) did tests fail
+  after AI changes? (c) how many retry cycles occurred?
+- Formula: `quality_risk = failed_test_runs / total_test_runs`
+- Can be extracted from tool-call results in the transcript.
+
+**Proxy: Churn rate**
+- How many times was the same file edited in the conversation? High
+  churn = the AI got it wrong repeatedly.
+- Formula: `churn = total_file_edits / unique_files_edited`
+
+**Confidence: Medium.** Test failures are a real signal. Churn is
+noisier (some tasks legitimately require iterative editing).
+
+#### 3. Data pollution risk (Section 13)
+
+**Proxy: Publication exposure**
+- Is the output likely to enter public corpora? Detect if the
+  conversation involves: git push to a public repo, writing
+  documentation, creating blog posts, Stack Overflow answers.
+- Formula: binary flag `public_output = true/false`, or estimate
+  `pollution_tokens = output_tokens_in_public_artifacts`
+- Can be detected from tool calls (git push, file writes to known
+  public paths).
+
+**Confidence: Low.** Many paths to publication are undetectable. But
+flagging known public pushes is better than nothing.
+
+#### 4. Monoculture risk (Section 15)
+
+**Proxy: Provider concentration**
+- Log which model and provider was used. Over time, the impact log
+  builds a picture of single-provider dependency.
+- Formula: `monoculture_index = sessions_with_dominant_provider / total_sessions`
+- Per-session: just log the model ID. The aggregate metric is computed
+  across sessions.
+
+**Confidence: Medium.** Simple to measure, meaningful at portfolio level.
+
+#### 5. Annotation labor (Section 10)
+
+**Proxy: Token-proportional RLHF demand**
+- Each conversation generates training signal (thumbs up/down, edits,
+  preference data). More tokens = more potential training data = more
+  annotation demand.
+- Formula: `rlhf_demand_proxy = output_tokens * annotation_rate`
+  where `annotation_rate` is estimated from published RLHF dataset
+  sizes vs. total conversation volume.
+- Very rough. But it makes the connection between "my conversation" and
+  "someone rates this output" concrete.
+
+**Confidence: Very low.** The annotation_rate is unknown. But even an
+order-of-magnitude estimate names the cost.
+
+#### 6. Creative displacement (Section 16)
+
+**Proxy: Substitution type**
+- Classify the conversation by what human role it substitutes: code
+  writing, code review, documentation, research, design.
+- Formula: categorical label, not a number. But the label enables
+  aggregation: "60% of my AI usage substitutes for junior developer
+  work."
+- Can be inferred from tool calls (Write/Edit = code writing, Grep/Read
+  = research, etc.).
+
+**Confidence: Low.** Classification is fuzzy. But naming what was
+displaced is better than ignoring it.
+
+#### 7. Power concentration (Section 11)
+
+**Proxy: Spend concentration**
+- Financial cost already tracked. Aggregating by provider shows how
+  much money flows to each company.
+- Formula: `provider_share = spend_with_provider / total_ai_spend`
+- Trivial to compute from existing data. The interpretation is what
+  matters: "I sent $X to Anthropic this month."
+
+**Confidence: High for the number, low for what it means.**
+
+#### 8. Content filtering opacity (Section 11)
+
+**Proxy: Block count**
+- Count how many responses were blocked by content filtering during
+  the conversation.
+- Formula: `filter_blocks = count(blocked_responses)`
+- Can be detected from error messages in the transcript.
+
+**Confidence: High.** Easy to measure. Interpretation is subjective.
+
+### Infeasible: honest quantification not possible per-conversation
+
+#### 9. Linguistic homogenization (Section 10)
+- Could log conversation language, but the per-conversation contribution
+  to language endangerment is genuinely unattributable. A counter
+  ("this conversation was in English") is factual but not a meaningful
+  cost metric. **Keep qualitative.**
+
+#### 10. Geopolitical resource competition (Section 11)
+- No per-conversation proxy exists. The connection between one API call
+  and semiconductor export controls is real but too diffuse to measure.
+  **Keep qualitative.**
+
+#### 11. Mental health effects (Section 18)
+- Would require user self-report. No passive measurement is honest.
+  **Keep qualitative unless user opts into self-assessment.**
+
+#### 12. Scientific integrity contamination (Section 14)
+- Overlaps with data pollution (proxy #3 above). The additional risk
+  (AI in research methodology) is context-dependent and cannot be
+  detected from the conversation alone. **Keep qualitative.**
+
+## Implementation plan
+
+### Phase 1: Low-hanging fruit (extend existing hook)
+
+Modify `pre-compact-snapshot.sh` to extract from the transcript:
+
+1. **Automation ratio**: output_tokens / (output_tokens + user_input_tokens)
+2. **Model ID**: already available from API metadata
+3. **Test pass/fail counts**: parse tool call results for test outcomes
+4. **File churn**: count Edit/Write tool calls per unique file
+5. **Public push flag**: detect `git push` in tool calls
+
+Add these fields to the JSONL log alongside existing metrics.
+
+Estimated effort: extend the existing Python/bash parsing, ~100 lines.
+
+### Phase 2: Post-conversation signals
+
+Add an optional post-commit hook:
+
+6. **Review delta**: compare AI-generated code (from transcript) with
+   actual committed code. Measures human review effort.
+
+Estimated effort: new hook, ~50 lines. Requires git integration.
+
+### Phase 3: Aggregate metrics
+
+Build a dashboard script (extend `show-impact.sh`) that computes
+portfolio-level metrics across sessions:
+
+7. **Monoculture index**: provider concentration over time
+8. **Spend concentration**: cumulative $ per provider
+9. **Displacement profile**: % of sessions by substitution type
+10. **RLHF demand estimate**: cumulative annotation labor proxy
+
+### Phase 4: Methodology update
+
+Update `impact-methodology.md` Section 19 confidence summary:
+- Move categories with proxies from "Unquantifiable" to "Proxy available"
+- Document each proxy's limitations honestly
+- Update the toolkit README to reflect new capabilities
+
+Update `impact-toolkit/README.md` to accurately describe what the
+toolkit measures.
+
+## What this does NOT do
+
+- It does not make the unquantifiable quantifiable. Some costs remain
+  qualitative by design.
+- It does not produce a single "social cost score." Collapsing
+  incommensurable harms into one number would be dishonest.
+- It does not claim precision. Every proxy is explicitly labeled with
+  its confidence and failure modes.
+
+## Success criteria
+
+- The toolkit reports at least 5 non-environmental metrics per session.
+- Each metric has documented limitations in the methodology.
+- The confidence summary has fewer "Unquantifiable" entries.
+- No metric is misleading — a proxy that doesn't work is removed, not
+  kept for show.
+
+## Risks
+
+- **Goodhart's law**: Once measured, users may optimize for the metric
+  rather than the underlying cost (e.g., adding fake user tokens to
+  lower automation ratio). Mitigate by documenting that proxies are
+  indicators, not targets.
+- **False precision**: Numbers create an illusion of understanding.
+  Mitigate by always showing confidence levels alongside values.
+- **Scope creep**: Trying to measure everything dilutes the toolkit's
+  usability. Start with Phase 1 only, evaluate before proceeding.