Initial commit: AI conversation impact methodology and toolkit

CC0-licensed methodology for estimating the environmental and social costs of AI conversations (20+ categories), plus a reusable toolkit for automated impact tracking in Claude Code sessions.
2026-03-16 09:46:49 +00:00 · 2026-03-16 09:46:49 +00:00 · 0543a43816
commit 0543a43816
27 changed files with 2439 additions and 0 deletions
--- a/tasks/05-calibrate-tokens.md
+++ b/tasks/05-calibrate-tokens.md
@ -0,0 +1,29 @@
+# Task 5: Calibrate token estimates
+
+**Plan**: reusable-impact-tooling
+**Status**: DONE (hook now extracts actual token counts from transcript usage fields; falls back to heuristic; weights cache reads at 10% for energy estimates)
+**Deliverable**: Updated estimation logic in `pre-compact-snapshot.sh`
+
+## What to do
+
+1. The current heuristic uses 4 bytes per token. Claude's tokenizer
+   (based on BPE) averages ~3.5-4.5 bytes per token for English prose
+   but varies for code, JSON, and non-English text. The transcript is
+   mostly JSON with embedded code and English text.
+
+2. Estimate a better ratio by:
+   - Sampling a known transcript and comparing byte count to the token
+     count reported in API responses (if available in the transcript).
+   - If API token counts are present in the transcript JSON, use them
+     directly instead of estimating.
+
+3. The output token ratio (currently fixed at 5% of transcript) is also
+   rough. Check if the transcript contains `usage` fields with actual
+   output token counts.
+
+4. Update the script with improved heuristics or direct extraction.
+
+## Done when
+
+- Token estimates are within ~20% of actual (if verifiable) or use
+  actual counts from the transcript when available.