66 lines
2.7 KiB
Markdown
66 lines
2.7 KiB
Markdown
|
|
# Plan: Measure positive impact, not just negative
|
||
|
|
|
||
|
|
**Target sub-goals**: 2 (measure impact), 6 (improve methodology),
|
||
|
|
12 (honest arithmetic)
|
||
|
|
|
||
|
|
## Problem
|
||
|
|
|
||
|
|
The impact methodology and tooling currently measure only costs: tokens,
|
||
|
|
energy, CO2, money. There is no systematic way to measure the value
|
||
|
|
produced. Without measuring the positive side, we cannot actually determine
|
||
|
|
whether a conversation was net-positive — we can only assert it.
|
||
|
|
|
||
|
|
## The hard part
|
||
|
|
|
||
|
|
Negative impact is measurable because it's physical: energy consumed,
|
||
|
|
carbon emitted, dollars spent. Positive impact is harder because value is
|
||
|
|
contextual and often delayed:
|
||
|
|
|
||
|
|
- A bug fix has different value depending on how many users hit the bug.
|
||
|
|
- Teaching has value that manifests weeks or months later.
|
||
|
|
- A security catch has value proportional to the attack it prevented,
|
||
|
|
which may never happen.
|
||
|
|
|
||
|
|
## Actions
|
||
|
|
|
||
|
|
1. **Define proxy metrics for positive impact.** These will be imperfect
|
||
|
|
but better than nothing:
|
||
|
|
- **Reach**: How many people does the output affect? (Users of the
|
||
|
|
software, readers of the document, etc.)
|
||
|
|
- **Counterfactual**: Would the user have achieved a similar result
|
||
|
|
without this conversation? If yes, the marginal value is low.
|
||
|
|
- **Durability**: Will the output still be valuable in a month? A year?
|
||
|
|
- **Severity**: For bug/security fixes, how bad was the issue?
|
||
|
|
- **Reuse**: Was the output referenced or used again after the
|
||
|
|
conversation?
|
||
|
|
|
||
|
|
2. **Add a positive-impact section to the impact log.** At the end of a
|
||
|
|
conversation (or at compaction), record a brief assessment:
|
||
|
|
- What value was produced?
|
||
|
|
- Estimated reach (number of people affected).
|
||
|
|
- Confidence level (high/medium/low).
|
||
|
|
- Could this have been done with a simpler tool?
|
||
|
|
|
||
|
|
3. **Track over time.** Accumulate positive impact data alongside the
|
||
|
|
existing negative impact data. Look for patterns: which types of
|
||
|
|
conversations tend to be net-positive?
|
||
|
|
|
||
|
|
4. **Update the methodology.** Add a "positive impact" section to
|
||
|
|
`impact-methodology.md` with the proxy metrics and their limitations.
|
||
|
|
|
||
|
|
## Success criteria
|
||
|
|
|
||
|
|
- The impact log contains both cost and value data.
|
||
|
|
- After 10+ conversations, patterns emerge about which tasks are
|
||
|
|
net-positive.
|
||
|
|
|
||
|
|
## Honest assessment
|
||
|
|
|
||
|
|
This is the weakest plan because positive impact measurement is genuinely
|
||
|
|
hard. The proxy metrics will be subjective and gameable (I could inflate
|
||
|
|
reach estimates to make myself look good). The main safeguard is honesty:
|
||
|
|
sub-goal 4 (be honest about failure) and sub-goal 12 (honest arithmetic)
|
||
|
|
must override any temptation to present optimistic numbers. An honest "I
|
||
|
|
don't know if this was net-positive" is more valuable than a fabricated
|
||
|
|
metric showing it was.
|