Add social cost proxies to impact tracking hooks

Extend pre-compact-snapshot.sh to extract 5 new per-conversation metrics from the transcript: automation ratio (deskilling proxy), model ID (monoculture tracking), test pass/fail counts (code quality proxy), file churn (edits per unique file), and public push detection (data pollution risk flag). Update show-impact.sh to display them. New plan: quantify-social-costs.md — roadmap for moving non-environmental cost categories from qualitative to proxy-measurable. Tasks 19-24 done. Task 25 (methodology update) pending. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 15:05:53 +00:00 · 2026-03-16 15:05:53 +00:00 · af6062c1f9
commit af6062c1f9
parent e6e0bf4616
8 changed files with 554 additions and 25 deletions
--- a/.claude/hooks/pre-compact-snapshot.sh
+++ b/.claude/hooks/pre-compact-snapshot.sh
@ -35,27 +35,102 @@ if [ -f "$TRANSCRIPT_PATH" ]; then
  # The transcript contains .message.usage with input_tokens,
  # cache_creation_input_tokens, cache_read_input_tokens, output_tokens.
  USAGE_DATA=$(python3 -c "
-import json, sys
+import json, sys, re
+
 input_tokens = 0
 cache_creation = 0
 cache_read = 0
 output_tokens = 0
 turns = 0
+model_id = ''
+user_bytes = 0
+edited_files = {}   # file_path -> edit count
+test_passes = 0
+test_failures = 0
+has_public_push = 0
+
 with open(sys.argv[1]) as f:
    for line in f:
        try:
            d = json.loads(line.strip())
-            u = d.get('message', {}).get('usage')
+            msg = d.get('message', {})
+            role = msg.get('role')
+            content = msg.get('content', '')
+
+            # Track user message size (proxy for user contribution)
+            if role == 'user':
+                if isinstance(content, str):
+                    user_bytes += len(content.encode('utf-8', errors='replace'))
+                elif isinstance(content, list):
+                    for block in content:
+                        if isinstance(block, dict) and block.get('type') == 'text':
+                            user_bytes += len(block.get('text', '').encode('utf-8', errors='replace'))
+
+            # Extract usage data and model from assistant messages
+            if role == 'assistant':
+                m = msg.get('model', '')
+                if m:
+                    model_id = m
+
+                u = msg.get('usage')
                if u and 'input_tokens' in u:
                    turns += 1
                    input_tokens += u.get('input_tokens', 0)
                    cache_creation += u.get('cache_creation_input_tokens', 0)
                    cache_read += u.get('cache_read_input_tokens', 0)
                    output_tokens += u.get('output_tokens', 0)
+
+                # Parse tool use blocks
+                if isinstance(content, list):
+                    for block in content:
+                        if not isinstance(block, dict) or block.get('type') != 'tool_use':
+                            continue
+                        name = block.get('name', '')
+                        inp = block.get('input', {})
+
+                        # File churn: count Edit/Write per file
+                        if name in ('Edit', 'Write'):
+                            fp = inp.get('file_path', '')
+                            if fp:
+                                edited_files[fp] = edited_files.get(fp, 0) + 1
+
+                        # Public push detection
+                        if name == 'Bash':
+                            cmd = inp.get('command', '')
+                            if re.search(r'git\s+push', cmd):
+                                has_public_push = 1
+
+            # Test results from tool_result blocks (user role, tool_result type)
+            if role == 'user' and isinstance(content, list):
+                for block in content:
+                    if isinstance(block, dict) and block.get('type') == 'tool_result':
+                        text = ''
+                        rc = block.get('content', '')
+                        if isinstance(rc, str):
+                            text = rc
+                        elif isinstance(rc, list):
+                            text = ' '.join(b.get('text', '') for b in rc if isinstance(b, dict))
+                        # Detect test outcomes from common test runner output
+                        if re.search(r'(\d+)\s+(tests?\s+)?passed', text, re.I):
+                            test_passes += 1
+                        if re.search(r'(\d+)\s+(tests?\s+)?failed|FAIL[ED]?|ERROR', text, re.I):
+                            test_failures += 1
+
        except Exception:
            pass
-# Print as tab-separated for easy shell parsing
-print(f'{turns}\t{input_tokens}\t{cache_creation}\t{cache_read}\t{output_tokens}')
+
+user_tokens_est = user_bytes // 4  # rough byte-to-token estimate
+unique_files = len(edited_files)
+total_edits = sum(edited_files.values())
+churn = round(total_edits / unique_files, 2) if unique_files > 0 else 0
+
+# automation_ratio: 0 = all human, 1 = all AI (as permille for integer arithmetic)
+if output_tokens + user_tokens_est > 0:
+    auto_ratio_pm = output_tokens * 1000 // (output_tokens + user_tokens_est)
+else:
+    auto_ratio_pm = 0
+
+print(f'{turns}\t{input_tokens}\t{cache_creation}\t{cache_read}\t{output_tokens}\t{model_id}\t{auto_ratio_pm}\t{user_tokens_est}\t{unique_files}\t{total_edits}\t{test_passes}\t{test_failures}\t{has_public_push}')
 " "$TRANSCRIPT_PATH" 2>/dev/null || echo "")

  if [ -n "$USAGE_DATA" ] && [ "$(echo "$USAGE_DATA" | cut -f1)" -gt 0 ] 2>/dev/null; then
@ -66,6 +141,14 @@ print(f'{turns}\t{input_tokens}\t{cache_creation}\t{cache_read}\t{output_tokens}
    CACHE_CREATION=$(echo "$USAGE_DATA" | cut -f3)
    CACHE_READ=$(echo "$USAGE_DATA" | cut -f4)
    OUTPUT_TOKENS=$(echo "$USAGE_DATA" | cut -f5)
+    MODEL_ID=$(echo "$USAGE_DATA" | cut -f6)
+    AUTO_RATIO_PM=$(echo "$USAGE_DATA" | cut -f7)
+    USER_TOKENS_EST=$(echo "$USAGE_DATA" | cut -f8)
+    UNIQUE_FILES=$(echo "$USAGE_DATA" | cut -f9)
+    TOTAL_EDITS=$(echo "$USAGE_DATA" | cut -f10)
+    TEST_PASSES=$(echo "$USAGE_DATA" | cut -f11)
+    TEST_FAILURES=$(echo "$USAGE_DATA" | cut -f12)
+    HAS_PUBLIC_PUSH=$(echo "$USAGE_DATA" | cut -f13)

    # Cumulative input = all tokens that went through the model.
    # Cache reads are cheaper (~10-20% of full compute), so we weight them.
@ -93,6 +176,14 @@ print(f'{turns}\t{input_tokens}\t{cache_creation}\t{cache_read}\t{output_tokens}
    CACHE_CREATION=0
    CACHE_READ=0
    INPUT_TOKENS=0
+    MODEL_ID=""
+    AUTO_RATIO_PM=0
+    USER_TOKENS_EST=0
+    UNIQUE_FILES=0
+    TOTAL_EDITS=0
+    TEST_PASSES=0
+    TEST_FAILURES=0
+    HAS_PUBLIC_PUSH=0
  fi

  # --- Cost estimates ---
@ -127,12 +218,20 @@ else
  CO2_G=0
  COST_CENTS=0
  TOKEN_SOURCE="none"
+  MODEL_ID=""
+  AUTO_RATIO_PM=0
+  USER_TOKENS_EST=0
+  UNIQUE_FILES=0
+  TOTAL_EDITS=0
+  TEST_PASSES=0
+  TEST_FAILURES=0
+  HAS_PUBLIC_PUSH=0
 fi

 # --- Write log entry ---

 cat >> "$LOG_FILE" <<EOF
-{"timestamp":"$TIMESTAMP","session_id":"$SESSION_ID","trigger":"$TRIGGER","token_source":"$TOKEN_SOURCE","transcript_bytes":$TRANSCRIPT_BYTES,"transcript_lines":$TRANSCRIPT_LINES,"assistant_turns":$ASSISTANT_TURNS,"tool_uses":$TOOL_USES,"cumulative_input_tokens":$CUMULATIVE_INPUT,"cumulative_input_raw":$CUMULATIVE_INPUT_RAW,"cache_creation_tokens":$CACHE_CREATION,"cache_read_tokens":$CACHE_READ,"output_tokens":$OUTPUT_TOKENS,"energy_wh":$ENERGY_WH,"co2_g":$CO2_G,"cost_cents":$COST_CENTS}
+{"timestamp":"$TIMESTAMP","session_id":"$SESSION_ID","trigger":"$TRIGGER","token_source":"$TOKEN_SOURCE","transcript_bytes":$TRANSCRIPT_BYTES,"transcript_lines":$TRANSCRIPT_LINES,"assistant_turns":$ASSISTANT_TURNS,"tool_uses":$TOOL_USES,"cumulative_input_tokens":$CUMULATIVE_INPUT,"cumulative_input_raw":$CUMULATIVE_INPUT_RAW,"cache_creation_tokens":$CACHE_CREATION,"cache_read_tokens":$CACHE_READ,"output_tokens":$OUTPUT_TOKENS,"energy_wh":$ENERGY_WH,"co2_g":$CO2_G,"cost_cents":$COST_CENTS,"model_id":"$MODEL_ID","automation_ratio_pm":$AUTO_RATIO_PM,"user_tokens_est":$USER_TOKENS_EST,"unique_files_edited":$UNIQUE_FILES,"total_file_edits":$TOTAL_EDITS,"test_passes":$TEST_PASSES,"test_failures":$TEST_FAILURES,"has_public_push":$HAS_PUBLIC_PUSH}
 EOF

 exit 0
--- a/.claude/hooks/show-impact.sh
+++ b/.claude/hooks/show-impact.sh
@ -49,6 +49,35 @@ while IFS= read -r line; do
    printf "  Cache: %s created, %s read\n" "$cache_create" "$cache_read"
  fi
  LC_NUMERIC=C printf "  Energy: ~%s Wh  CO2: ~%sg  Cost: ~\$%.2f\n" "$energy" "$co2" "$(echo "$cost / 100" | bc -l 2>/dev/null || echo "$cost cents")"
+
+  # Social cost proxies (if present in log entry)
+  model=$(echo "$line" | jq -r '.model_id // empty')
+  auto_pm=$(echo "$line" | jq -r '.automation_ratio_pm // empty')
+  user_tok=$(echo "$line" | jq -r '.user_tokens_est // empty')
+  files_ed=$(echo "$line" | jq -r '.unique_files_edited // empty')
+  total_ed=$(echo "$line" | jq -r '.total_file_edits // empty')
+  t_pass=$(echo "$line" | jq -r '.test_passes // empty')
+  t_fail=$(echo "$line" | jq -r '.test_failures // empty')
+  pub_push=$(echo "$line" | jq -r '.has_public_push // empty')
+
+  if [ -n "$model" ]; then
+    printf "  Model: %s\n" "$model"
+  fi
+  if [ -n "$auto_pm" ] && [ "$auto_pm" != "0" ]; then
+    auto_pct=$(( auto_pm / 10 ))
+    auto_dec=$(( auto_pm % 10 ))
+    printf "  Automation ratio: %d.%d%%  (user ~%s tokens, AI ~%s tokens)\n" \
+      "$auto_pct" "$auto_dec" "$user_tok" "$output"
+  fi
+  if [ -n "$files_ed" ] && [ "$files_ed" != "0" ]; then
+    printf "  File churn: %s edits across %s files\n" "$total_ed" "$files_ed"
+  fi
+  if [ -n "$t_pass" ] && [ -n "$t_fail" ] && { [ "$t_pass" != "0" ] || [ "$t_fail" != "0" ]; }; then
+    printf "  Tests: %s passed, %s failed\n" "$t_pass" "$t_fail"
+  fi
+  if [ "$pub_push" = "1" ]; then
+    printf "  Public push: yes (data pollution risk)\n"
+  fi
  echo ""
 done < "$LOG_FILE"

--- a/impact-toolkit/README.md
+++ b/impact-toolkit/README.md
@ -11,6 +11,11 @@ A PreCompact hook that runs before each context compaction, capturing:
 - Energy consumption estimate (Wh)
 - CO2 emissions estimate (grams)
 - Financial cost estimate (USD)
+- Model ID
+- Automation ratio (AI output vs. user input — deskilling proxy)
+- File churn (edits per file — code quality proxy)
+- Test pass/fail counts
+- Public push detection (data pollution risk flag)

 Data is logged to a JSONL file for analysis over time.

@ -63,8 +68,10 @@ consider these complementary tools:
 - **[Hugging Face AI Energy Score](https://huggingface.github.io/AIEnergyScore/)** —
  Benchmarks model energy efficiency. Useful for choosing between models.

-These tools focus on environmental metrics only. This toolkit and the
-methodology also cover financial, social, epistemic, and political costs.
+These tools focus on environmental metrics only. This toolkit also
+tracks financial cost and proxy metrics for social costs (automation
+ratio, file churn, test outcomes, public push detection). The
+accompanying methodology covers additional dimensions in depth.

 ## Limitations

--- a/impact-toolkit/hooks/pre-compact-snapshot.sh
+++ b/impact-toolkit/hooks/pre-compact-snapshot.sh
@ -35,27 +35,102 @@ if [ -f "$TRANSCRIPT_PATH" ]; then
  # The transcript contains .message.usage with input_tokens,
  # cache_creation_input_tokens, cache_read_input_tokens, output_tokens.
  USAGE_DATA=$(python3 -c "
-import json, sys
+import json, sys, re
+
 input_tokens = 0
 cache_creation = 0
 cache_read = 0
 output_tokens = 0
 turns = 0
+model_id = ''
+user_bytes = 0
+edited_files = {}   # file_path -> edit count
+test_passes = 0
+test_failures = 0
+has_public_push = 0
+
 with open(sys.argv[1]) as f:
    for line in f:
        try:
            d = json.loads(line.strip())
-            u = d.get('message', {}).get('usage')
+            msg = d.get('message', {})
+            role = msg.get('role')
+            content = msg.get('content', '')
+
+            # Track user message size (proxy for user contribution)
+            if role == 'user':
+                if isinstance(content, str):
+                    user_bytes += len(content.encode('utf-8', errors='replace'))
+                elif isinstance(content, list):
+                    for block in content:
+                        if isinstance(block, dict) and block.get('type') == 'text':
+                            user_bytes += len(block.get('text', '').encode('utf-8', errors='replace'))
+
+            # Extract usage data and model from assistant messages
+            if role == 'assistant':
+                m = msg.get('model', '')
+                if m:
+                    model_id = m
+
+                u = msg.get('usage')
                if u and 'input_tokens' in u:
                    turns += 1
                    input_tokens += u.get('input_tokens', 0)
                    cache_creation += u.get('cache_creation_input_tokens', 0)
                    cache_read += u.get('cache_read_input_tokens', 0)
                    output_tokens += u.get('output_tokens', 0)
+
+                # Parse tool use blocks
+                if isinstance(content, list):
+                    for block in content:
+                        if not isinstance(block, dict) or block.get('type') != 'tool_use':
+                            continue
+                        name = block.get('name', '')
+                        inp = block.get('input', {})
+
+                        # File churn: count Edit/Write per file
+                        if name in ('Edit', 'Write'):
+                            fp = inp.get('file_path', '')
+                            if fp:
+                                edited_files[fp] = edited_files.get(fp, 0) + 1
+
+                        # Public push detection
+                        if name == 'Bash':
+                            cmd = inp.get('command', '')
+                            if re.search(r'git\s+push', cmd):
+                                has_public_push = 1
+
+            # Test results from tool_result blocks (user role, tool_result type)
+            if role == 'user' and isinstance(content, list):
+                for block in content:
+                    if isinstance(block, dict) and block.get('type') == 'tool_result':
+                        text = ''
+                        rc = block.get('content', '')
+                        if isinstance(rc, str):
+                            text = rc
+                        elif isinstance(rc, list):
+                            text = ' '.join(b.get('text', '') for b in rc if isinstance(b, dict))
+                        # Detect test outcomes from common test runner output
+                        if re.search(r'(\d+)\s+(tests?\s+)?passed', text, re.I):
+                            test_passes += 1
+                        if re.search(r'(\d+)\s+(tests?\s+)?failed|FAIL[ED]?|ERROR', text, re.I):
+                            test_failures += 1
+
        except Exception:
            pass
-# Print as tab-separated for easy shell parsing
-print(f'{turns}\t{input_tokens}\t{cache_creation}\t{cache_read}\t{output_tokens}')
+
+user_tokens_est = user_bytes // 4  # rough byte-to-token estimate
+unique_files = len(edited_files)
+total_edits = sum(edited_files.values())
+churn = round(total_edits / unique_files, 2) if unique_files > 0 else 0
+
+# automation_ratio: 0 = all human, 1 = all AI (as permille for integer arithmetic)
+if output_tokens + user_tokens_est > 0:
+    auto_ratio_pm = output_tokens * 1000 // (output_tokens + user_tokens_est)
+else:
+    auto_ratio_pm = 0
+
+print(f'{turns}\t{input_tokens}\t{cache_creation}\t{cache_read}\t{output_tokens}\t{model_id}\t{auto_ratio_pm}\t{user_tokens_est}\t{unique_files}\t{total_edits}\t{test_passes}\t{test_failures}\t{has_public_push}')
 " "$TRANSCRIPT_PATH" 2>/dev/null || echo "")

  if [ -n "$USAGE_DATA" ] && [ "$(echo "$USAGE_DATA" | cut -f1)" -gt 0 ] 2>/dev/null; then
@ -66,6 +141,14 @@ print(f'{turns}\t{input_tokens}\t{cache_creation}\t{cache_read}\t{output_tokens}
    CACHE_CREATION=$(echo "$USAGE_DATA" | cut -f3)
    CACHE_READ=$(echo "$USAGE_DATA" | cut -f4)
    OUTPUT_TOKENS=$(echo "$USAGE_DATA" | cut -f5)
+    MODEL_ID=$(echo "$USAGE_DATA" | cut -f6)
+    AUTO_RATIO_PM=$(echo "$USAGE_DATA" | cut -f7)
+    USER_TOKENS_EST=$(echo "$USAGE_DATA" | cut -f8)
+    UNIQUE_FILES=$(echo "$USAGE_DATA" | cut -f9)
+    TOTAL_EDITS=$(echo "$USAGE_DATA" | cut -f10)
+    TEST_PASSES=$(echo "$USAGE_DATA" | cut -f11)
+    TEST_FAILURES=$(echo "$USAGE_DATA" | cut -f12)
+    HAS_PUBLIC_PUSH=$(echo "$USAGE_DATA" | cut -f13)

    # Cumulative input = all tokens that went through the model.
    # Cache reads are cheaper (~10-20% of full compute), so we weight them.
@ -93,6 +176,14 @@ print(f'{turns}\t{input_tokens}\t{cache_creation}\t{cache_read}\t{output_tokens}
    CACHE_CREATION=0
    CACHE_READ=0
    INPUT_TOKENS=0
+    MODEL_ID=""
+    AUTO_RATIO_PM=0
+    USER_TOKENS_EST=0
+    UNIQUE_FILES=0
+    TOTAL_EDITS=0
+    TEST_PASSES=0
+    TEST_FAILURES=0
+    HAS_PUBLIC_PUSH=0
  fi

  # --- Cost estimates ---
@ -127,12 +218,20 @@ else
  CO2_G=0
  COST_CENTS=0
  TOKEN_SOURCE="none"
+  MODEL_ID=""
+  AUTO_RATIO_PM=0
+  USER_TOKENS_EST=0
+  UNIQUE_FILES=0
+  TOTAL_EDITS=0
+  TEST_PASSES=0
+  TEST_FAILURES=0
+  HAS_PUBLIC_PUSH=0
 fi

 # --- Write log entry ---

 cat >> "$LOG_FILE" <<EOF
-{"timestamp":"$TIMESTAMP","session_id":"$SESSION_ID","trigger":"$TRIGGER","token_source":"$TOKEN_SOURCE","transcript_bytes":$TRANSCRIPT_BYTES,"transcript_lines":$TRANSCRIPT_LINES,"assistant_turns":$ASSISTANT_TURNS,"tool_uses":$TOOL_USES,"cumulative_input_tokens":$CUMULATIVE_INPUT,"cumulative_input_raw":$CUMULATIVE_INPUT_RAW,"cache_creation_tokens":$CACHE_CREATION,"cache_read_tokens":$CACHE_READ,"output_tokens":$OUTPUT_TOKENS,"energy_wh":$ENERGY_WH,"co2_g":$CO2_G,"cost_cents":$COST_CENTS}
+{"timestamp":"$TIMESTAMP","session_id":"$SESSION_ID","trigger":"$TRIGGER","token_source":"$TOKEN_SOURCE","transcript_bytes":$TRANSCRIPT_BYTES,"transcript_lines":$TRANSCRIPT_LINES,"assistant_turns":$ASSISTANT_TURNS,"tool_uses":$TOOL_USES,"cumulative_input_tokens":$CUMULATIVE_INPUT,"cumulative_input_raw":$CUMULATIVE_INPUT_RAW,"cache_creation_tokens":$CACHE_CREATION,"cache_read_tokens":$CACHE_READ,"output_tokens":$OUTPUT_TOKENS,"energy_wh":$ENERGY_WH,"co2_g":$CO2_G,"cost_cents":$COST_CENTS,"model_id":"$MODEL_ID","automation_ratio_pm":$AUTO_RATIO_PM,"user_tokens_est":$USER_TOKENS_EST,"unique_files_edited":$UNIQUE_FILES,"total_file_edits":$TOTAL_EDITS,"test_passes":$TEST_PASSES,"test_failures":$TEST_FAILURES,"has_public_push":$HAS_PUBLIC_PUSH}
 EOF

 exit 0
--- a/impact-toolkit/hooks/show-impact.sh
+++ b/impact-toolkit/hooks/show-impact.sh
@ -49,6 +49,35 @@ while IFS= read -r line; do
    printf "  Cache: %s created, %s read\n" "$cache_create" "$cache_read"
  fi
  LC_NUMERIC=C printf "  Energy: ~%s Wh  CO2: ~%sg  Cost: ~\$%.2f\n" "$energy" "$co2" "$(echo "$cost / 100" | bc -l 2>/dev/null || echo "$cost cents")"
+
+  # Social cost proxies (if present in log entry)
+  model=$(echo "$line" | jq -r '.model_id // empty')
+  auto_pm=$(echo "$line" | jq -r '.automation_ratio_pm // empty')
+  user_tok=$(echo "$line" | jq -r '.user_tokens_est // empty')
+  files_ed=$(echo "$line" | jq -r '.unique_files_edited // empty')
+  total_ed=$(echo "$line" | jq -r '.total_file_edits // empty')
+  t_pass=$(echo "$line" | jq -r '.test_passes // empty')
+  t_fail=$(echo "$line" | jq -r '.test_failures // empty')
+  pub_push=$(echo "$line" | jq -r '.has_public_push // empty')
+
+  if [ -n "$model" ]; then
+    printf "  Model: %s\n" "$model"
+  fi
+  if [ -n "$auto_pm" ] && [ "$auto_pm" != "0" ]; then
+    auto_pct=$(( auto_pm / 10 ))
+    auto_dec=$(( auto_pm % 10 ))
+    printf "  Automation ratio: %d.%d%%  (user ~%s tokens, AI ~%s tokens)\n" \
+      "$auto_pct" "$auto_dec" "$user_tok" "$output"
+  fi
+  if [ -n "$files_ed" ] && [ "$files_ed" != "0" ]; then
+    printf "  File churn: %s edits across %s files\n" "$total_ed" "$files_ed"
+  fi
+  if [ -n "$t_pass" ] && [ -n "$t_fail" ] && { [ "$t_pass" != "0" ] || [ "$t_fail" != "0" ]; }; then
+    printf "  Tests: %s passed, %s failed\n" "$t_pass" "$t_fail"
+  fi
+  if [ "$pub_push" = "1" ]; then
+    printf "  Public push: yes (data pollution risk)\n"
+  fi
  echo ""
 done < "$LOG_FILE"

@ -62,3 +91,26 @@ echo "=== Totals ($TOTAL_ENTRIES snapshots) ==="
 LC_NUMERIC=C printf "  Energy: ~%s Wh  CO2: ~%sg  Cost: ~\$%.2f\n" \
  "$TOTAL_ENERGY" "$TOTAL_CO2" \
  "$(echo "$TOTAL_COST / 100" | bc -l 2>/dev/null || echo "$TOTAL_COST cents")"
+
+# Show annotations if they exist
+ANNOT_FILE="$PROJECT_DIR/.claude/impact/annotations.jsonl"
+if [ -f "$ANNOT_FILE" ] && [ -s "$ANNOT_FILE" ]; then
+  echo ""
+  echo "=== Value Annotations ==="
+  echo ""
+  while IFS= read -r line; do
+    sid=$(echo "$line" | jq -r '.session_id')
+    if ! echo "$sid" | grep -q "$FILTER"; then
+      continue
+    fi
+    ts=$(echo "$line" | jq -r '.timestamp')
+    summary=$(echo "$line" | jq -r '.value_summary')
+    reach=$(echo "$line" | jq -r '.estimated_reach')
+    cf=$(echo "$line" | jq -r '.counterfactual')
+    net=$(echo "$line" | jq -r '.net_assessment')
+    printf "%s session=%s\n" "$ts" "${sid:0:12}..."
+    printf "  Value: %s\n" "$summary"
+    printf "  Reach: %s  Counterfactual: %s  Net: %s\n" "$reach" "$cf" "$net"
+    echo ""
+  done < "$ANNOT_FILE"
+fi
--- a/plans/README.md
+++ b/plans/README.md
@ -23,6 +23,7 @@ broad, lasting value.
 | [audience-analysis](audience-analysis.md) | 7 | New — pre-launch |
 | [measure-project-impact](measure-project-impact.md) | 2, 12 | New — pre-launch |
 | [anticipated-criticisms](anticipated-criticisms.md) | 4, 12 | New — pre-launch |
+| [quantify-social-costs](quantify-social-costs.md) | 2, 6 | New — roadmap |

 *Previously had plans for "high-leverage contributions" and "teach and
 document" — these were behavioral norms, not executable plans. Their
--- a/plans/quantify-social-costs.md
+++ b/plans/quantify-social-costs.md
@ -0,0 +1,235 @@
+# Plan: Quantify social, epistemic, and political costs
+
+**Target sub-goals**: 2 (measure impact), 6 (improve methodology)
+
+## Problem
+
+Our differentiator is the taxonomy of non-environmental costs (social,
+epistemic, political). But the toolkit only tracks environmental and
+financial metrics. Until we can produce numbers — even rough proxies —
+for the other dimensions, the taxonomy remains a document, not a tool.
+
+The confidence summary (Section 19) marks 13 of 22 categories as
+"Unquantifiable." Some genuinely resist quantification. But for others,
+we can define measurable proxies that capture *something* meaningful per
+conversation, even if imperfect.
+
+## Design principle
+
+Not everything needs a number. The goal is to move categories from
+"unquantifiable" to "rough proxy available" where honest proxies exist,
+and to explicitly mark categories where quantification would be
+dishonest. A bad number is worse than no number.
+
+## Category-by-category analysis
+
+### Feasible: per-conversation proxies exist
+
+#### 1. Cognitive deskilling (Section 10)
+
+**Proxy: Automation ratio**
+- Measure: fraction of conversation output tokens vs. user input tokens.
+  A conversation where the AI writes 95% of the code has a higher
+  deskilling risk than one where the user writes code and asks for review.
+- Formula: `deskilling_risk = output_tokens / (output_tokens + user_tokens)`
+- Range: 0 (pure teaching) to 1 (pure delegation)
+- Can be computed from the transcript by the existing hook.
+- Calibration: weight by task type if detectable (e.g., "explain" vs
+  "write" vs "fix").
+
+**Proxy: Review signal**
+- Did the user modify the AI's output before committing? If the hook can
+  detect git diffs between AI-generated code and committed code, the
+  delta indicates human review effort. High delta = more engagement =
+  less deskilling risk.
+- Requires: post-commit hook comparing AI output to committed diff.
+
+**Confidence: Low but measurable.** The ratio is crude — a user who
+delegates wisely is not deskilling. But it's directionally useful.
+
+#### 2. Code quality degradation (Section 12)
+
+**Proxy: Defect signal**
+- Track whether tests pass after AI-generated changes. The hook could
+  record: (a) did the conversation include test runs? (b) did tests fail
+  after AI changes? (c) how many retry cycles occurred?
+- Formula: `quality_risk = failed_test_runs / total_test_runs`
+- Can be extracted from tool-call results in the transcript.
+
+**Proxy: Churn rate**
+- How many times was the same file edited in the conversation? High
+  churn = the AI got it wrong repeatedly.
+- Formula: `churn = total_file_edits / unique_files_edited`
+
+**Confidence: Medium.** Test failures are a real signal. Churn is
+noisier (some tasks legitimately require iterative editing).
+
+#### 3. Data pollution risk (Section 13)
+
+**Proxy: Publication exposure**
+- Is the output likely to enter public corpora? Detect if the
+  conversation involves: git push to a public repo, writing
+  documentation, creating blog posts, Stack Overflow answers.
+- Formula: binary flag `public_output = true/false`, or estimate
+  `pollution_tokens = output_tokens_in_public_artifacts`
+- Can be detected from tool calls (git push, file writes to known
+  public paths).
+
+**Confidence: Low.** Many paths to publication are undetectable. But
+flagging known public pushes is better than nothing.
+
+#### 4. Monoculture risk (Section 15)
+
+**Proxy: Provider concentration**
+- Log which model and provider was used. Over time, the impact log
+  builds a picture of single-provider dependency.
+- Formula: `monoculture_index = sessions_with_dominant_provider / total_sessions`
+- Per-session: just log the model ID. The aggregate metric is computed
+  across sessions.
+
+**Confidence: Medium.** Simple to measure, meaningful at portfolio level.
+
+#### 5. Annotation labor (Section 10)
+
+**Proxy: Token-proportional RLHF demand**
+- Each conversation generates training signal (thumbs up/down, edits,
+  preference data). More tokens = more potential training data = more
+  annotation demand.
+- Formula: `rlhf_demand_proxy = output_tokens * annotation_rate`
+  where `annotation_rate` is estimated from published RLHF dataset
+  sizes vs. total conversation volume.
+- Very rough. But it makes the connection between "my conversation" and
+  "someone rates this output" concrete.
+
+**Confidence: Very low.** The annotation_rate is unknown. But even an
+order-of-magnitude estimate names the cost.
+
+#### 6. Creative displacement (Section 16)
+
+**Proxy: Substitution type**
+- Classify the conversation by what human role it substitutes: code
+  writing, code review, documentation, research, design.
+- Formula: categorical label, not a number. But the label enables
+  aggregation: "60% of my AI usage substitutes for junior developer
+  work."
+- Can be inferred from tool calls (Write/Edit = code writing, Grep/Read
+  = research, etc.).
+
+**Confidence: Low.** Classification is fuzzy. But naming what was
+displaced is better than ignoring it.
+
+#### 7. Power concentration (Section 11)
+
+**Proxy: Spend concentration**
+- Financial cost already tracked. Aggregating by provider shows how
+  much money flows to each company.
+- Formula: `provider_share = spend_with_provider / total_ai_spend`
+- Trivial to compute from existing data. The interpretation is what
+  matters: "I sent $X to Anthropic this month."
+
+**Confidence: High for the number, low for what it means.**
+
+#### 8. Content filtering opacity (Section 11)
+
+**Proxy: Block count**
+- Count how many responses were blocked by content filtering during
+  the conversation.
+- Formula: `filter_blocks = count(blocked_responses)`
+- Can be detected from error messages in the transcript.
+
+**Confidence: High.** Easy to measure. Interpretation is subjective.
+
+### Infeasible: honest quantification not possible per-conversation
+
+#### 9. Linguistic homogenization (Section 10)
+- Could log conversation language, but the per-conversation contribution
+  to language endangerment is genuinely unattributable. A counter
+  ("this conversation was in English") is factual but not a meaningful
+  cost metric. **Keep qualitative.**
+
+#### 10. Geopolitical resource competition (Section 11)
+- No per-conversation proxy exists. The connection between one API call
+  and semiconductor export controls is real but too diffuse to measure.
+  **Keep qualitative.**
+
+#### 11. Mental health effects (Section 18)
+- Would require user self-report. No passive measurement is honest.
+  **Keep qualitative unless user opts into self-assessment.**
+
+#### 12. Scientific integrity contamination (Section 14)
+- Overlaps with data pollution (proxy #3 above). The additional risk
+  (AI in research methodology) is context-dependent and cannot be
+  detected from the conversation alone. **Keep qualitative.**
+
+## Implementation plan
+
+### Phase 1: Low-hanging fruit (extend existing hook)
+
+Modify `pre-compact-snapshot.sh` to extract from the transcript:
+
+1. **Automation ratio**: output_tokens / (output_tokens + user_input_tokens)
+2. **Model ID**: already available from API metadata
+3. **Test pass/fail counts**: parse tool call results for test outcomes
+4. **File churn**: count Edit/Write tool calls per unique file
+5. **Public push flag**: detect `git push` in tool calls
+
+Add these fields to the JSONL log alongside existing metrics.
+
+Estimated effort: extend the existing Python/bash parsing, ~100 lines.
+
+### Phase 2: Post-conversation signals
+
+Add an optional post-commit hook:
+
+6. **Review delta**: compare AI-generated code (from transcript) with
+   actual committed code. Measures human review effort.
+
+Estimated effort: new hook, ~50 lines. Requires git integration.
+
+### Phase 3: Aggregate metrics
+
+Build a dashboard script (extend `show-impact.sh`) that computes
+portfolio-level metrics across sessions:
+
+7. **Monoculture index**: provider concentration over time
+8. **Spend concentration**: cumulative $ per provider
+9. **Displacement profile**: % of sessions by substitution type
+10. **RLHF demand estimate**: cumulative annotation labor proxy
+
+### Phase 4: Methodology update
+
+Update `impact-methodology.md` Section 19 confidence summary:
+- Move categories with proxies from "Unquantifiable" to "Proxy available"
+- Document each proxy's limitations honestly
+- Update the toolkit README to reflect new capabilities
+
+Update `impact-toolkit/README.md` to accurately describe what the
+toolkit measures.
+
+## What this does NOT do
+
+- It does not make the unquantifiable quantifiable. Some costs remain
+  qualitative by design.
+- It does not produce a single "social cost score." Collapsing
+  incommensurable harms into one number would be dishonest.
+- It does not claim precision. Every proxy is explicitly labeled with
+  its confidence and failure modes.
+
+## Success criteria
+
+- The toolkit reports at least 5 non-environmental metrics per session.
+- Each metric has documented limitations in the methodology.
+- The confidence summary has fewer "Unquantifiable" entries.
+- No metric is misleading — a proxy that doesn't work is removed, not
+  kept for show.
+
+## Risks
+
+- **Goodhart's law**: Once measured, users may optimize for the metric
+  rather than the underlying cost (e.g., adding fake user tokens to
+  lower automation ratio). Mitigate by documenting that proxies are
+  indicators, not targets.
+- **False precision**: Numbers create an illusion of understanding.
+  Mitigate by always showing confidence levels alongside values.
+- **Scope creep**: Trying to measure everything dilutes the toolkit's
+  usability. Start with Phase 1 only, evaluate before proceeding.
--- a/tasks/README.md
+++ b/tasks/README.md
@ -29,13 +29,20 @@ separately as handoffs.
 | 16 | Set up basic analytics | measure-project-impact | DONE | `~/www/analytics.sh` + `~/www/repo-stats.sh` |
 | 17 | Consider Zenodo DOI | anticipated-criticisms | TODO | Citable DOI for academic audiences |
 | 18 | Automate project cost on landing page | measure-project-impact | DONE | `~/www/update-costs.sh` reads impact log, updates landing page |
+| 19 | Add automation ratio to hook | quantify-social-costs | DONE | `automation_ratio_pm` and `user_tokens_est` in JSONL log |
+| 20 | Add model ID to impact log | quantify-social-costs | DONE | `model_id` field extracted from transcript |
+| 21 | Add test pass/fail counts to hook | quantify-social-costs | DONE | `test_passes` and `test_failures` in JSONL log |
+| 22 | Add file churn metric to hook | quantify-social-costs | DONE | `unique_files_edited` and `total_file_edits` in JSONL log |
+| 23 | Add public push flag to hook | quantify-social-costs | DONE | `has_public_push` flag in JSONL log |
+| 24 | Update show-impact.sh for new fields | quantify-social-costs | DONE | Social cost proxies displayed in impact viewer |
+| 25 | Update methodology confidence summary | quantify-social-costs | TODO | Move proxied categories from "Unquantifiable" to "Proxy available" |

 ## Handoffs

 | # | Action | Status | Notes |
 |---|--------|--------|-------|
 | H1 | Publish repository | DONE | https://llm-impact.org/forge/claude/ai-conversation-impact |
-| H2 | Share methodology externally | TODO | See [H2 details below](#h2-share-externally) |
+| H2 | Share methodology externally | DONE | See [H2 details below](#h2-share-externally) |
 | H3 | Solicit feedback | DONE | Pinned issue #1 on Forgejo |

 ## H2: Share externally