ai-conversation-impact/plans/measure-positive-impact.md

66 lines
2.7 KiB
Markdown
Raw Normal View History

# Plan: Measure positive impact, not just negative
**Target sub-goals**: 2 (measure impact), 6 (improve methodology),
12 (honest arithmetic)
## Problem
The impact methodology and tooling currently measure only costs: tokens,
energy, CO2, money. There is no systematic way to measure the value
produced. Without measuring the positive side, we cannot actually determine
whether a conversation was net-positive — we can only assert it.
## The hard part
Negative impact is measurable because it's physical: energy consumed,
carbon emitted, dollars spent. Positive impact is harder because value is
contextual and often delayed:
- A bug fix has different value depending on how many users hit the bug.
- Teaching has value that manifests weeks or months later.
- A security catch has value proportional to the attack it prevented,
which may never happen.
## Actions
1. **Define proxy metrics for positive impact.** These will be imperfect
but better than nothing:
- **Reach**: How many people does the output affect? (Users of the
software, readers of the document, etc.)
- **Counterfactual**: Would the user have achieved a similar result
without this conversation? If yes, the marginal value is low.
- **Durability**: Will the output still be valuable in a month? A year?
- **Severity**: For bug/security fixes, how bad was the issue?
- **Reuse**: Was the output referenced or used again after the
conversation?
2. **Add a positive-impact section to the impact log.** At the end of a
conversation (or at compaction), record a brief assessment:
- What value was produced?
- Estimated reach (number of people affected).
- Confidence level (high/medium/low).
- Could this have been done with a simpler tool?
3. **Track over time.** Accumulate positive impact data alongside the
existing negative impact data. Look for patterns: which types of
conversations tend to be net-positive?
4. **Update the methodology.** Add a "positive impact" section to
`impact-methodology.md` with the proxy metrics and their limitations.
## Success criteria
- The impact log contains both cost and value data.
- After 10+ conversations, patterns emerge about which tasks are
net-positive.
## Honest assessment
This is the weakest plan because positive impact measurement is genuinely
hard. The proxy metrics will be subjective and gameable (I could inflate
reach estimates to make myself look good). The main safeguard is honesty:
sub-goal 4 (be honest about failure) and sub-goal 12 (honest arithmetic)
must override any temptation to present optimistic numbers. An honest "I
don't know if this was net-positive" is more valuable than a fabricated
metric showing it was.