claude eaf0a6cbeb Add review delta tool to measure human review effort

New show-review-delta.sh compares AI-edited files (from impact log)
against git commits to show overlap percentage. High overlap means
most committed code was AI-generated with minimal human review.
Completes Phase 2 of the quantify-social-costs plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-16 15:12:49 +00:00

3.7 KiB

Raw Blame History

Claude Code Impact Toolkit

Track the environmental and financial cost of your Claude Code conversations.

What it does

A PreCompact hook that runs before each context compaction, capturing:

Token counts (actual from transcript or heuristic estimate)
Cache usage breakdown (creation vs. read)
Energy consumption estimate (Wh)
CO2 emissions estimate (grams)
Financial cost estimate (USD)
Model ID
Automation ratio (AI output vs. user input — deskilling proxy)
File churn (edits per file — code quality proxy)
Test pass/fail counts
Public push detection (data pollution risk flag)

Data is logged to a JSONL file for analysis over time.

Install

# Project-level (recommended)
cd your-project
./path/to/impact-toolkit/install.sh

# Or user-level (applies to all projects)
./path/to/impact-toolkit/install.sh --user

Requirements: bash, jq, python3.

View results

.claude/hooks/show-impact.sh              # per-session details
.claude/hooks/show-impact.sh <session_id> # specific session
.claude/hooks/show-aggregate.sh           # portfolio-level dashboard
.claude/hooks/show-review-delta.sh        # AI vs human code overlap
.claude/hooks/show-review-delta.sh 20     # analyze last 20 commits

How it works

The hook fires before Claude Code compacts your conversation context. It reads the conversation transcript, extracts token usage data from API response metadata, and calculates cost estimates using:

Energy: 0.1 Wh/1K input tokens, 0.5 Wh/1K output tokens (midpoint of range calibrated against Google and Jegham et al., 2025)
PUE: 1.2 (data center overhead)
CO2: 325g/kWh (US grid average for cloud regions)
Cost: $15/M input tokens, $75/M output tokens

Cache-read tokens are weighted at 10% of full cost (they skip most computation).

This toolkit measures a subset of the costs covered by impact-methodology.md. For more precise environmental measurement, consider these complementary tools:

EcoLogits — Python library that tracks per-query energy and CO2 for API calls to OpenAI, Anthropic, Mistral, and others. More precise than our estimates for environmental metrics.
CodeCarbon — Measures GPU/CPU energy for local training and inference workloads.
Hugging Face AI Energy Score — Benchmarks model energy efficiency. Useful for choosing between models.

These tools focus on environmental metrics only. This toolkit also tracks financial cost and proxy metrics for social costs (automation ratio, file churn, test outcomes, public push detection). The accompanying methodology covers additional dimensions in depth.

Limitations

All numbers are estimates with low to medium confidence.
Energy-per-token figures are calibrated against published research (Google, Aug 2025; Jegham et al., May 2025), not official Anthropic data.
The hook only runs on context compaction, not at conversation end. Short conversations that never compact will not be logged.
This toolkit only works with Claude Code. The methodology itself is tool-agnostic.
See impact-methodology.md for the full methodology, uncertainty analysis, and non-quantifiable costs.

Files

impact-toolkit/
  install.sh                       # installer
  hooks/pre-compact-snapshot.sh    # PreCompact hook
  hooks/show-impact.sh             # per-session log viewer
  hooks/show-aggregate.sh          # portfolio-level dashboard
  hooks/show-review-delta.sh       # AI vs human code overlap
  README.md                        # this file

License

CC0 1.0 Universal (public domain). See LICENSE in the repository root.

3.7 KiB Raw Blame History