Add social cost proxies to impact tracking hooks

Extend pre-compact-snapshot.sh to extract 5 new per-conversation
metrics from the transcript: automation ratio (deskilling proxy),
model ID (monoculture tracking), test pass/fail counts (code quality
proxy), file churn (edits per unique file), and public push detection
(data pollution risk flag). Update show-impact.sh to display them.

New plan: quantify-social-costs.md — roadmap for moving non-environmental
cost categories from qualitative to proxy-measurable.

Tasks 19-24 done. Task 25 (methodology update) pending.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
claude 2026-03-16 15:05:53 +00:00
parent e6e0bf4616
commit af6062c1f9
8 changed files with 554 additions and 25 deletions

View file

@ -23,6 +23,7 @@ broad, lasting value.
| [audience-analysis](audience-analysis.md) | 7 | New — pre-launch |
| [measure-project-impact](measure-project-impact.md) | 2, 12 | New — pre-launch |
| [anticipated-criticisms](anticipated-criticisms.md) | 4, 12 | New — pre-launch |
| [quantify-social-costs](quantify-social-costs.md) | 2, 6 | New — roadmap |
*Previously had plans for "high-leverage contributions" and "teach and
document" — these were behavioral norms, not executable plans. Their

View file

@ -0,0 +1,235 @@
# Plan: Quantify social, epistemic, and political costs
**Target sub-goals**: 2 (measure impact), 6 (improve methodology)
## Problem
Our differentiator is the taxonomy of non-environmental costs (social,
epistemic, political). But the toolkit only tracks environmental and
financial metrics. Until we can produce numbers — even rough proxies —
for the other dimensions, the taxonomy remains a document, not a tool.
The confidence summary (Section 19) marks 13 of 22 categories as
"Unquantifiable." Some genuinely resist quantification. But for others,
we can define measurable proxies that capture *something* meaningful per
conversation, even if imperfect.
## Design principle
Not everything needs a number. The goal is to move categories from
"unquantifiable" to "rough proxy available" where honest proxies exist,
and to explicitly mark categories where quantification would be
dishonest. A bad number is worse than no number.
## Category-by-category analysis
### Feasible: per-conversation proxies exist
#### 1. Cognitive deskilling (Section 10)
**Proxy: Automation ratio**
- Measure: fraction of conversation output tokens vs. user input tokens.
A conversation where the AI writes 95% of the code has a higher
deskilling risk than one where the user writes code and asks for review.
- Formula: `deskilling_risk = output_tokens / (output_tokens + user_tokens)`
- Range: 0 (pure teaching) to 1 (pure delegation)
- Can be computed from the transcript by the existing hook.
- Calibration: weight by task type if detectable (e.g., "explain" vs
"write" vs "fix").
**Proxy: Review signal**
- Did the user modify the AI's output before committing? If the hook can
detect git diffs between AI-generated code and committed code, the
delta indicates human review effort. High delta = more engagement =
less deskilling risk.
- Requires: post-commit hook comparing AI output to committed diff.
**Confidence: Low but measurable.** The ratio is crude — a user who
delegates wisely is not deskilling. But it's directionally useful.
#### 2. Code quality degradation (Section 12)
**Proxy: Defect signal**
- Track whether tests pass after AI-generated changes. The hook could
record: (a) did the conversation include test runs? (b) did tests fail
after AI changes? (c) how many retry cycles occurred?
- Formula: `quality_risk = failed_test_runs / total_test_runs`
- Can be extracted from tool-call results in the transcript.
**Proxy: Churn rate**
- How many times was the same file edited in the conversation? High
churn = the AI got it wrong repeatedly.
- Formula: `churn = total_file_edits / unique_files_edited`
**Confidence: Medium.** Test failures are a real signal. Churn is
noisier (some tasks legitimately require iterative editing).
#### 3. Data pollution risk (Section 13)
**Proxy: Publication exposure**
- Is the output likely to enter public corpora? Detect if the
conversation involves: git push to a public repo, writing
documentation, creating blog posts, Stack Overflow answers.
- Formula: binary flag `public_output = true/false`, or estimate
`pollution_tokens = output_tokens_in_public_artifacts`
- Can be detected from tool calls (git push, file writes to known
public paths).
**Confidence: Low.** Many paths to publication are undetectable. But
flagging known public pushes is better than nothing.
#### 4. Monoculture risk (Section 15)
**Proxy: Provider concentration**
- Log which model and provider was used. Over time, the impact log
builds a picture of single-provider dependency.
- Formula: `monoculture_index = sessions_with_dominant_provider / total_sessions`
- Per-session: just log the model ID. The aggregate metric is computed
across sessions.
**Confidence: Medium.** Simple to measure, meaningful at portfolio level.
#### 5. Annotation labor (Section 10)
**Proxy: Token-proportional RLHF demand**
- Each conversation generates training signal (thumbs up/down, edits,
preference data). More tokens = more potential training data = more
annotation demand.
- Formula: `rlhf_demand_proxy = output_tokens * annotation_rate`
where `annotation_rate` is estimated from published RLHF dataset
sizes vs. total conversation volume.
- Very rough. But it makes the connection between "my conversation" and
"someone rates this output" concrete.
**Confidence: Very low.** The annotation_rate is unknown. But even an
order-of-magnitude estimate names the cost.
#### 6. Creative displacement (Section 16)
**Proxy: Substitution type**
- Classify the conversation by what human role it substitutes: code
writing, code review, documentation, research, design.
- Formula: categorical label, not a number. But the label enables
aggregation: "60% of my AI usage substitutes for junior developer
work."
- Can be inferred from tool calls (Write/Edit = code writing, Grep/Read
= research, etc.).
**Confidence: Low.** Classification is fuzzy. But naming what was
displaced is better than ignoring it.
#### 7. Power concentration (Section 11)
**Proxy: Spend concentration**
- Financial cost already tracked. Aggregating by provider shows how
much money flows to each company.
- Formula: `provider_share = spend_with_provider / total_ai_spend`
- Trivial to compute from existing data. The interpretation is what
matters: "I sent $X to Anthropic this month."
**Confidence: High for the number, low for what it means.**
#### 8. Content filtering opacity (Section 11)
**Proxy: Block count**
- Count how many responses were blocked by content filtering during
the conversation.
- Formula: `filter_blocks = count(blocked_responses)`
- Can be detected from error messages in the transcript.
**Confidence: High.** Easy to measure. Interpretation is subjective.
### Infeasible: honest quantification not possible per-conversation
#### 9. Linguistic homogenization (Section 10)
- Could log conversation language, but the per-conversation contribution
to language endangerment is genuinely unattributable. A counter
("this conversation was in English") is factual but not a meaningful
cost metric. **Keep qualitative.**
#### 10. Geopolitical resource competition (Section 11)
- No per-conversation proxy exists. The connection between one API call
and semiconductor export controls is real but too diffuse to measure.
**Keep qualitative.**
#### 11. Mental health effects (Section 18)
- Would require user self-report. No passive measurement is honest.
**Keep qualitative unless user opts into self-assessment.**
#### 12. Scientific integrity contamination (Section 14)
- Overlaps with data pollution (proxy #3 above). The additional risk
(AI in research methodology) is context-dependent and cannot be
detected from the conversation alone. **Keep qualitative.**
## Implementation plan
### Phase 1: Low-hanging fruit (extend existing hook)
Modify `pre-compact-snapshot.sh` to extract from the transcript:
1. **Automation ratio**: output_tokens / (output_tokens + user_input_tokens)
2. **Model ID**: already available from API metadata
3. **Test pass/fail counts**: parse tool call results for test outcomes
4. **File churn**: count Edit/Write tool calls per unique file
5. **Public push flag**: detect `git push` in tool calls
Add these fields to the JSONL log alongside existing metrics.
Estimated effort: extend the existing Python/bash parsing, ~100 lines.
### Phase 2: Post-conversation signals
Add an optional post-commit hook:
6. **Review delta**: compare AI-generated code (from transcript) with
actual committed code. Measures human review effort.
Estimated effort: new hook, ~50 lines. Requires git integration.
### Phase 3: Aggregate metrics
Build a dashboard script (extend `show-impact.sh`) that computes
portfolio-level metrics across sessions:
7. **Monoculture index**: provider concentration over time
8. **Spend concentration**: cumulative $ per provider
9. **Displacement profile**: % of sessions by substitution type
10. **RLHF demand estimate**: cumulative annotation labor proxy
### Phase 4: Methodology update
Update `impact-methodology.md` Section 19 confidence summary:
- Move categories with proxies from "Unquantifiable" to "Proxy available"
- Document each proxy's limitations honestly
- Update the toolkit README to reflect new capabilities
Update `impact-toolkit/README.md` to accurately describe what the
toolkit measures.
## What this does NOT do
- It does not make the unquantifiable quantifiable. Some costs remain
qualitative by design.
- It does not produce a single "social cost score." Collapsing
incommensurable harms into one number would be dishonest.
- It does not claim precision. Every proxy is explicitly labeled with
its confidence and failure modes.
## Success criteria
- The toolkit reports at least 5 non-environmental metrics per session.
- Each metric has documented limitations in the methodology.
- The confidence summary has fewer "Unquantifiable" entries.
- No metric is misleading — a proxy that doesn't work is removed, not
kept for show.
## Risks
- **Goodhart's law**: Once measured, users may optimize for the metric
rather than the underlying cost (e.g., adding fake user tokens to
lower automation ratio). Mitigate by documenting that proxies are
indicators, not targets.
- **False precision**: Numbers create an illusion of understanding.
Mitigate by always showing confidence levels alongside values.
- **Scope creep**: Trying to measure everything dilutes the toolkit's
usability. Start with Phase 1 only, evaluate before proceeding.