Codette Adapter Training Lab
Codette is an experimental AI research system for recursive reasoning, multi-perspective cognition, and ethical AI alignment.
This repository contains the complete training pipeline for developing Codette LoRA adapters on Llama 3.1 8B.
Architecture
codette-training-lab/
βββ dataset_engine/ # Dataset generation pipeline
β βββ template_registry.py # Rich template pools per adapter
β βββ answer_generator.py # Structured educational answer generation
β βββ dataset_generator.py # Main generator with dedup + validation
β βββ templates/ # JSON template definitions
β
βββ reasoning_forge/ # Multi-agent reasoning dataset refinement
β βββ agents/ # Newton, Quantum, Ethics, Philosophy, DaVinci, Empathy
β βββ critic_agent.py # Quality evaluation agent
β βββ synthesis_engine.py # Multi-perspective synthesis
β βββ problem_generator.py # Reasoning problem generation
β βββ forge_engine.py # Orchestrator
β
βββ training/ # LoRA training scripts
β βββ train_adapter.py # Single adapter training (4-bit LoRA)
β βββ train_all_adapters.py# Sequential multi-adapter training
β βββ merge_adapters.py # Merge LoRA into base model
β βββ configs/ # Training hyperparameters
β
βββ evaluation/ # Benchmarks and quality assurance
β βββ reasoning_metrics.py # Multi-dimensional scoring
β βββ benchmark_runner.py # Automated evaluation
β βββ dataset_validator.py # Dataset quality checks
β βββ failure_analyzer.py # Weakness detection
β βββ prompts/ # Benchmark test sets
β
βββ observatory/ # Experiment tracking and monitoring
β βββ metrics_logger.py # Training run logging
β βββ performance_tracker.py # Improvement trends
β βββ dataset_quality_monitor.py
β βββ dashboard.py # ASCII status dashboard
β
βββ research/ # Source research documents
β βββ papers/ # Published manuscripts
β βββ frameworks/ # RC+xi, quantum equations, perspectives
β βββ experiments/ # Cocoon simulations, logs
β
βββ datasets/ # Generated training datasets (JSONL)
βββ adapters/ # Trained LoRA adapters
βββ scripts/ # Pipeline orchestration
β βββ run_full_pipeline.py # End-to-end pipeline
β βββ hf_job.yaml # HuggingFace job config
βββ configs/ # System configuration
βββ adapter_registry.yaml
βββ pipeline_config.yaml
Adapters
| Adapter | Domain | Target Examples | System Prompt |
|---|---|---|---|
| Newton | Analytical physics reasoning | 3000 | Newtonian analytical precision |
| DaVinci | Creative invention thinking | 2500 | Creative inventiveness |
| Empathy | Emotional understanding | 2500 | Deep empathy and EQ |
| Philosophy | Conceptual reasoning | 2000 | Philosophical depth |
| Quantum | Probabilistic thinking | 2000 | Quantum probabilistic thinking |
| RC+xi | Recursive cognition | 3000 | RC+xi framework reasoning |
| Multi-Perspective | Synthesis across lenses | 2500 | Multi-perspective synthesis |
| Systems | AI architecture | 2000 | System architecture design |
Training Pipeline
research documents
β
dataset extraction (template-based generation)
β
synthetic reasoning expansion (counterexamples, variations)
β
dataset validation (dedup, quality filter)
β
reasoning forge (multi-agent critique + refinement)
β
adapter training (4-bit LoRA on Llama 3.1 8B)
β
benchmark evaluation (multi-dimensional reasoning metrics)
β
observatory logging (track improvement over time)
Quick Start
Install dependencies
pip install -r requirements.txt
Generate all datasets
python -m dataset_engine.generate_all
Run full pipeline
python scripts/run_full_pipeline.py --all
Generate + validate only
python scripts/run_full_pipeline.py --generate --validate
Train a single adapter
python -m training.train_adapter \
--dataset datasets/newton_reasoning.jsonl \
--adapter-name newton \
--output-dir adapters/newton
Run benchmarks
python -m evaluation.benchmark_runner --prompts evaluation/prompts/reasoning_tests.json
View dashboard
python -m observatory.dashboard
Dataset Format
All datasets use chat-format JSONL:
{
"messages": [
{"role": "system", "content": "You are Codette, a recursive multi-perspective reasoning AI."},
{"role": "user", "content": "Explain the conservation of momentum using a real-world example."},
{"role": "assistant", "content": "Conservation of momentum states that in a closed system..."}
]
}
Reasoning Forge
The Reasoning Forge refines training data through multi-agent debate:
concept β problem generator β agent analysis β critic evaluation β synthesis β training example
Agents: Newton (physics), Quantum (probability), Ethics (alignment), Philosophy (meaning), DaVinci (creativity), Empathy (emotion)
Each agent analyzes from its perspective, the critic scores quality, and the synthesis engine produces a unified multi-perspective response.
Base Model
- Model: meta-llama/Llama-3.1-8B-Instruct
- Method: QLoRA (4-bit quantization)
- LoRA config: rank=16, alpha=32, target=q/k/v/o projections
Research Background
Codette implements the RC+xi (Recursive Convergence + Epistemic Tension) framework for structured multi-perspective reasoning. The system coordinates 11 reasoning perspectives in parallel before synthesizing a final response.
Key research documents in research/:
- RC+xi Framework specification
- Quantum Cosmic Multicore experiment
- Codette Research Equations (8 core quantum mathematics)
- Multi-perspective reasoning architecture
Requirements
- Python 3.10+
- PyTorch 2.1+
- 16GB+ RAM (CPU training) or GPU with 8GB+ VRAM
- ~1-3 hours per adapter (CPU) or 20-40 min (A10/A100 GPU)
License
Research project - experimental AI development. MIT
- Downloads last month
- -
Model tree for Raiff1982/codette-training-lab
Base model
meta-llama/Llama-3.1-8B