Codeseys's picture
Wave 21b: skip zero-signal SDPO on empty-recovery error turns + real-trace validation
d61036a
|
Raw
History Blame Contribute Delete
2.85 kB

Real-trace SDPO alignment validation

Runs the full ingestion → adapter → collator → SDPO data path against your own local Claude Code session logs (~/.claude/projects/**/*.jsonl) and reports the live SDPO mask alignment ratio. This is the population-level proof that Wave 21's _build_chat_aligned_mask fix holds on real-world data, not just the synthetic fixture.

Run

python examples/validate_real_trace_alignment/run.py
# options:
#   --projects-dir ~/.claude/projects   where to discover sessions
#   --max-sessions 8                    how many error-bearing sessions to sample
#   --model Qwen/Qwen2.5-0.5B-Instruct  a real chat-template tokenizer
#   --pass-threshold 0.95               min alignment ratio to PASS
#   --strip-thinking                    (default OFF — see below)

Exit code: 0 PASS (alignment ≥ threshold, no crashes), 1 FAIL, 2 no error-bearing sessions found / no chat template.

What it measures

  • ingestion yield — states emitted, error sites detected
  • structural vs string-only flagging — the Wave 21 is_error fix. The ingester sets a structural tool_error: True boolean; string-tag-only should be ~0 (the brittle [TOOL_RESULT (ERROR)] grep is fallback-only).
  • empty-recovery rate — see below.
  • SDPO alignment — fraction of in-loss sdpo_loss_mask positions where student token id == teacher token id. ~100% means the mask lands exactly on content tokens; <95% means chat-template drift has regressed.

The --strip-thinking gotcha (important for SDPO)

ClaudeCodeIngester(strip_thinking=...) controls whether [THINKING] blocks survive. For most ingestion you strip them. For SDPO hint-distillation you must NOT — on real Claude Code traces the error-recovery turn is very often pure thinking (the model reasons about the failure, then silently retries a tool). Strip it and that turn's content goes empty, so ~67% of error sites carry no recovery content to distill against and produce a zero-signal SDPO row.

This script therefore defaults to strip_thinking=False. The collator also guards against the empty case (an empty-recovery error turn is treated as a non-error site rather than firing an all-ignore_index mask), but the signal only exists if you keep the thinking. Pass --strip-thinking to see the empty-recovery warning fire.

Representative result (Codeseys' machine, 2026-05-28)

sessions processed:       10/10
total error sites:        141
structural-flagged users: 170
string-tag-only users:    0
empty-recovery sites:     0/141 (0%)     # strip_thinking=False
SDPO alignment (REAL):    832/832 = 100.0%
RESULT: PASS ✅

With --strip-thinking the same sessions report ~67% empty-recovery and the measurable in-loss positions collapse accordingly — the lever is visible.