Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # Real-trace SDPO alignment validation | |
| Runs the full **ingestion → adapter → collator → SDPO** data path against your | |
| own local Claude Code session logs (`~/.claude/projects/**/*.jsonl`) and reports | |
| the live SDPO mask alignment ratio. This is the population-level proof that | |
| Wave 21's `_build_chat_aligned_mask` fix holds on real-world data, not just the | |
| synthetic fixture. | |
| ## Run | |
| ```bash | |
| python examples/validate_real_trace_alignment/run.py | |
| # options: | |
| # --projects-dir ~/.claude/projects where to discover sessions | |
| # --max-sessions 8 how many error-bearing sessions to sample | |
| # --model Qwen/Qwen2.5-0.5B-Instruct a real chat-template tokenizer | |
| # --pass-threshold 0.95 min alignment ratio to PASS | |
| # --strip-thinking (default OFF — see below) | |
| ``` | |
| Exit code: `0` PASS (alignment ≥ threshold, no crashes), `1` FAIL, `2` no | |
| error-bearing sessions found / no chat template. | |
| ## What it measures | |
| - **ingestion yield** — states emitted, error sites detected | |
| - **structural vs string-only flagging** — the Wave 21 `is_error` fix. The | |
| ingester sets a structural `tool_error: True` boolean; `string-tag-only` | |
| should be ~0 (the brittle `[TOOL_RESULT (ERROR)]` grep is fallback-only). | |
| - **empty-recovery rate** — see below. | |
| - **SDPO alignment** — fraction of in-loss `sdpo_loss_mask` positions where | |
| student token id == teacher token id. ~100% means the mask lands exactly on | |
| content tokens; <95% means chat-template drift has regressed. | |
| ## The `--strip-thinking` gotcha (important for SDPO) | |
| `ClaudeCodeIngester(strip_thinking=...)` controls whether `[THINKING]` blocks | |
| survive. For most ingestion you strip them. **For SDPO hint-distillation you | |
| must NOT** — on real Claude Code traces the error-*recovery* turn is very often | |
| **pure thinking** (the model reasons about the failure, then silently retries a | |
| tool). Strip it and that turn's content goes empty, so ~67% of error sites carry | |
| no recovery content to distill against and produce a zero-signal SDPO row. | |
| This script therefore defaults to `strip_thinking=False`. The collator also | |
| guards against the empty case (an empty-recovery error turn is treated as a | |
| non-error site rather than firing an all-`ignore_index` mask), but the *signal* | |
| only exists if you keep the thinking. Pass `--strip-thinking` to see the | |
| empty-recovery warning fire. | |
| ## Representative result (Codeseys' machine, 2026-05-28) | |
| ``` | |
| sessions processed: 10/10 | |
| total error sites: 141 | |
| structural-flagged users: 170 | |
| string-tag-only users: 0 | |
| empty-recovery sites: 0/141 (0%) # strip_thinking=False | |
| SDPO alignment (REAL): 832/832 = 100.0% | |
| RESULT: PASS ✅ | |
| ``` | |
| With `--strip-thinking` the same sessions report ~67% empty-recovery and the | |
| measurable in-loss positions collapse accordingly — the lever is visible. | |