ai-conversation-impact/tasks/05-calibrate-tokens.md

# Task 5: Calibrate token estimates

**Plan**: reusable-impact-tooling
**Status**: DONE (hook now extracts actual token counts from transcript usage fields; falls back to heuristic; weights cache reads at 10% for energy estimates)
**Deliverable**: Updated estimation logic in `pre-compact-snapshot.sh`

## What to do

1. The current heuristic uses 4 bytes per token. Claude's tokenizer
   (based on BPE) averages ~3.5-4.5 bytes per token for English prose
   but varies for code, JSON, and non-English text. The transcript is
   mostly JSON with embedded code and English text.

2. Estimate a better ratio by:
   - Sampling a known transcript and comparing byte count to the token
     count reported in API responses (if available in the transcript).
   - If API token counts are present in the transcript JSON, use them
     directly instead of estimating.

3. The output token ratio (currently fixed at 5% of transcript) is also
   rough. Check if the transcript contains `usage` fields with actual
   output token counts.

4. Update the script with improved heuristics or direct extraction.

## Done when

- Token estimates are within ~20% of actual (if verifiable) or use
  actual counts from the transcript when available.
Initial commit: AI conversation impact methodology and toolkit CC0-licensed methodology for estimating the environmental and social costs of AI conversations (20+ categories), plus a reusable toolkit for automated impact tracking in Claude Code sessions. 2026-03-16 09:46:49 +00:00			`# Task 5: Calibrate token estimates`

			`Plan: reusable-impact-tooling`
			`Status: DONE (hook now extracts actual token counts from transcript usage fields; falls back to heuristic; weights cache reads at 10% for energy estimates)`
			Deliverable: Updated estimation logic in `pre-compact-snapshot.sh`

			`## What to do`

			`1. The current heuristic uses 4 bytes per token. Claude's tokenizer`
			`(based on BPE) averages ~3.5-4.5 bytes per token for English prose`
			`but varies for code, JSON, and non-English text. The transcript is`
			`mostly JSON with embedded code and English text.`

			`2. Estimate a better ratio by:`
			`- Sampling a known transcript and comparing byte count to the token`
			`count reported in API responses (if available in the transcript).`
			`- If API token counts are present in the transcript JSON, use them`
			`directly instead of estimating.`

			`3. The output token ratio (currently fixed at 5% of transcript) is also`
			rough. Check if the transcript contains `usage` fields with actual
			`output token counts.`

			`4. Update the script with improved heuristics or direct extraction.`

			`## Done when`

			`- Token estimates are within ~20% of actual (if verifiable) or use`
			`actual counts from the transcript when available.`