# Real-trace SDPO alignment validation Runs the full **ingestion → adapter → collator → SDPO** data path against your own local Claude Code session logs (`~/.claude/projects/**/*.jsonl`) and reports the live SDPO mask alignment ratio. This is the population-level proof that Wave 21's `_build_chat_aligned_mask` fix holds on real-world data, not just the synthetic fixture. ## Run ```bash python examples/validate_real_trace_alignment/run.py # options: # --projects-dir ~/.claude/projects where to discover sessions # --max-sessions 8 how many error-bearing sessions to sample # --model Qwen/Qwen2.5-0.5B-Instruct a real chat-template tokenizer # --pass-threshold 0.95 min alignment ratio to PASS # --strip-thinking (default OFF — see below) ``` Exit code: `0` PASS (alignment ≥ threshold, no crashes), `1` FAIL, `2` no error-bearing sessions found / no chat template. ## What it measures - **ingestion yield** — states emitted, error sites detected - **structural vs string-only flagging** — the Wave 21 `is_error` fix. The ingester sets a structural `tool_error: True` boolean; `string-tag-only` should be ~0 (the brittle `[TOOL_RESULT (ERROR)]` grep is fallback-only). - **empty-recovery rate** — see below. - **SDPO alignment** — fraction of in-loss `sdpo_loss_mask` positions where student token id == teacher token id. ~100% means the mask lands exactly on content tokens; <95% means chat-template drift has regressed. ## The `--strip-thinking` gotcha (important for SDPO) `ClaudeCodeIngester(strip_thinking=...)` controls whether `[THINKING]` blocks survive. For most ingestion you strip them. **For SDPO hint-distillation you must NOT** — on real Claude Code traces the error-*recovery* turn is very often **pure thinking** (the model reasons about the failure, then silently retries a tool). Strip it and that turn's content goes empty, so ~67% of error sites carry no recovery content to distill against and produce a zero-signal SDPO row. This script therefore defaults to `strip_thinking=False`. The collator also guards against the empty case (an empty-recovery error turn is treated as a non-error site rather than firing an all-`ignore_index` mask), but the *signal* only exists if you keep the thinking. Pass `--strip-thinking` to see the empty-recovery warning fire. ## Representative result (Codeseys' machine, 2026-05-28) ``` sessions processed: 10/10 total error sites: 141 structural-flagged users: 170 string-tag-only users: 0 empty-recovery sites: 0/141 (0%) # strip_thinking=False SDPO alignment (REAL): 832/832 = 100.0% RESULT: PASS ✅ ``` With `--strip-thinking` the same sessions report ~67% empty-recovery and the measurable in-loss positions collapse accordingly — the lever is visible.