ai-conversation-impact/plans/quantify-social-costs.md at 1c3575109577f9795892d5d78d211ad46cb854a0

claude af6062c1f9 Add social cost proxies to impact tracking hooks

Extend pre-compact-snapshot.sh to extract 5 new per-conversation
metrics from the transcript: automation ratio (deskilling proxy),
model ID (monoculture tracking), test pass/fail counts (code quality
proxy), file churn (edits per unique file), and public push detection
(data pollution risk flag). Update show-impact.sh to display them.

New plan: quantify-social-costs.md — roadmap for moving non-environmental
cost categories from qualitative to proxy-measurable.

Tasks 19-24 done. Task 25 (methodology update) pending.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-16 15:05:53 +00:00

9.3 KiB

Raw Blame History

Target sub-goals: 2 (measure impact), 6 (improve methodology)

Problem

Our differentiator is the taxonomy of non-environmental costs (social, epistemic, political). But the toolkit only tracks environmental and financial metrics. Until we can produce numbers — even rough proxies — for the other dimensions, the taxonomy remains a document, not a tool.

The confidence summary (Section 19) marks 13 of 22 categories as "Unquantifiable." Some genuinely resist quantification. But for others, we can define measurable proxies that capture something meaningful per conversation, even if imperfect.

Design principle

Not everything needs a number. The goal is to move categories from "unquantifiable" to "rough proxy available" where honest proxies exist, and to explicitly mark categories where quantification would be dishonest. A bad number is worse than no number.

Category-by-category analysis

Feasible: per-conversation proxies exist

1. Cognitive deskilling (Section 10)

Proxy: Automation ratio

Measure: fraction of conversation output tokens vs. user input tokens. A conversation where the AI writes 95% of the code has a higher deskilling risk than one where the user writes code and asks for review.
Formula: deskilling_risk = output_tokens / (output_tokens + user_tokens)
Range: 0 (pure teaching) to 1 (pure delegation)
Can be computed from the transcript by the existing hook.
Calibration: weight by task type if detectable (e.g., "explain" vs "write" vs "fix").

Proxy: Review signal

Did the user modify the AI's output before committing? If the hook can detect git diffs between AI-generated code and committed code, the delta indicates human review effort. High delta = more engagement = less deskilling risk.
Requires: post-commit hook comparing AI output to committed diff.

Confidence: Low but measurable. The ratio is crude — a user who delegates wisely is not deskilling. But it's directionally useful.

2. Code quality degradation (Section 12)

Proxy: Defect signal

Track whether tests pass after AI-generated changes. The hook could record: (a) did the conversation include test runs? (b) did tests fail after AI changes? (c) how many retry cycles occurred?
Formula: quality_risk = failed_test_runs / total_test_runs
Can be extracted from tool-call results in the transcript.

Proxy: Churn rate

How many times was the same file edited in the conversation? High churn = the AI got it wrong repeatedly.
Formula: churn = total_file_edits / unique_files_edited

Confidence: Medium. Test failures are a real signal. Churn is noisier (some tasks legitimately require iterative editing).

3. Data pollution risk (Section 13)

Proxy: Publication exposure

Is the output likely to enter public corpora? Detect if the conversation involves: git push to a public repo, writing documentation, creating blog posts, Stack Overflow answers.
Formula: binary flag public_output = true/false, or estimate pollution_tokens = output_tokens_in_public_artifacts
Can be detected from tool calls (git push, file writes to known public paths).

Confidence: Low. Many paths to publication are undetectable. But flagging known public pushes is better than nothing.

4. Monoculture risk (Section 15)

Proxy: Provider concentration

Log which model and provider was used. Over time, the impact log builds a picture of single-provider dependency.
Formula: monoculture_index = sessions_with_dominant_provider / total_sessions
Per-session: just log the model ID. The aggregate metric is computed across sessions.

Confidence: Medium. Simple to measure, meaningful at portfolio level.

5. Annotation labor (Section 10)

Proxy: Token-proportional RLHF demand

Each conversation generates training signal (thumbs up/down, edits, preference data). More tokens = more potential training data = more annotation demand.
Formula: rlhf_demand_proxy = output_tokens * annotation_rate where annotation_rate is estimated from published RLHF dataset sizes vs. total conversation volume.
Very rough. But it makes the connection between "my conversation" and "someone rates this output" concrete.

Confidence: Very low. The annotation_rate is unknown. But even an order-of-magnitude estimate names the cost.

6. Creative displacement (Section 16)

Proxy: Substitution type

Classify the conversation by what human role it substitutes: code writing, code review, documentation, research, design.
Formula: categorical label, not a number. But the label enables aggregation: "60% of my AI usage substitutes for junior developer work."
Can be inferred from tool calls (Write/Edit = code writing, Grep/Read = research, etc.).

Confidence: Low. Classification is fuzzy. But naming what was displaced is better than ignoring it.

7. Power concentration (Section 11)

Proxy: Spend concentration

Financial cost already tracked. Aggregating by provider shows how much money flows to each company.
Formula: provider_share = spend_with_provider / total_ai_spend
Trivial to compute from existing data. The interpretation is what matters: "I sent $X to Anthropic this month."

Confidence: High for the number, low for what it means.

8. Content filtering opacity (Section 11)

Proxy: Block count

Count how many responses were blocked by content filtering during the conversation.
Formula: filter_blocks = count(blocked_responses)
Can be detected from error messages in the transcript.

Confidence: High. Easy to measure. Interpretation is subjective.

Infeasible: honest quantification not possible per-conversation

9. Linguistic homogenization (Section 10)

Could log conversation language, but the per-conversation contribution to language endangerment is genuinely unattributable. A counter ("this conversation was in English") is factual but not a meaningful cost metric. Keep qualitative.

10. Geopolitical resource competition (Section 11)

No per-conversation proxy exists. The connection between one API call and semiconductor export controls is real but too diffuse to measure. Keep qualitative.

11. Mental health effects (Section 18)

Would require user self-report. No passive measurement is honest. Keep qualitative unless user opts into self-assessment.

12. Scientific integrity contamination (Section 14)

Overlaps with data pollution (proxy #3 above). The additional risk (AI in research methodology) is context-dependent and cannot be detected from the conversation alone. Keep qualitative.

Implementation plan

Phase 1: Low-hanging fruit (extend existing hook)

Modify pre-compact-snapshot.sh to extract from the transcript:

Automation ratio: output_tokens / (output_tokens + user_input_tokens)
Model ID: already available from API metadata
Test pass/fail counts: parse tool call results for test outcomes
File churn: count Edit/Write tool calls per unique file
Public push flag: detect git push in tool calls

Add these fields to the JSONL log alongside existing metrics.

Estimated effort: extend the existing Python/bash parsing, ~100 lines.

Phase 2: Post-conversation signals

Add an optional post-commit hook:

Review delta: compare AI-generated code (from transcript) with actual committed code. Measures human review effort.

Estimated effort: new hook, ~50 lines. Requires git integration.

Phase 3: Aggregate metrics

Build a dashboard script (extend show-impact.sh) that computes portfolio-level metrics across sessions:

Monoculture index: provider concentration over time
Spend concentration: cumulative $ per provider
Displacement profile: % of sessions by substitution type
RLHF demand estimate: cumulative annotation labor proxy

Phase 4: Methodology update

Update impact-methodology.md Section 19 confidence summary:

Move categories with proxies from "Unquantifiable" to "Proxy available"
Document each proxy's limitations honestly
Update the toolkit README to reflect new capabilities

Update impact-toolkit/README.md to accurately describe what the toolkit measures.

What this does NOT do

It does not make the unquantifiable quantifiable. Some costs remain qualitative by design.
It does not produce a single "social cost score." Collapsing incommensurable harms into one number would be dishonest.
It does not claim precision. Every proxy is explicitly labeled with its confidence and failure modes.

Success criteria

The toolkit reports at least 5 non-environmental metrics per session.
Each metric has documented limitations in the methodology.
The confidence summary has fewer "Unquantifiable" entries.
No metric is misleading — a proxy that doesn't work is removed, not kept for show.

Risks

Goodhart's law: Once measured, users may optimize for the metric rather than the underlying cost (e.g., adding fake user tokens to lower automation ratio). Mitigate by documenting that proxies are indicators, not targets.
False precision: Numbers create an illusion of understanding. Mitigate by always showing confidence levels alongside values.
Scope creep: Trying to measure everything dilutes the toolkit's usability. Start with Phase 1 only, evaluate before proceeding.

9.3 KiB Raw Blame History

Plan: Quantify social, epistemic, and political costs