| # Real-Time Agent Thinking β Verbose Evaluation Guide |
|
|
| ## Quick Start |
|
|
| See agents thinking in real-time as they analyze and debate: |
|
|
| ```bash |
| python evaluation/run_evaluation_verbose.py --questions 1 |
| ``` |
|
|
| ## What You'll See |
|
|
| ### 1. **Orchestrator Initialization** (40 seconds) |
| ``` |
| INFO:codette_orchestrator | INFO | Loading base model (one-time)... |
| INFO:codette_orchestrator | INFO | GPU layers: 35 (0=CPU only, 35+=full GPU offload) |
| INFO:codette_orchestrator | INFO | β GPU acceleration ENABLED (35 layers offloaded) |
| INFO:codette_orchestrator | INFO | Base model loaded in 8.2s |
| ``` |
|
|
| ### 2. **Agent Setup** |
| ``` |
| [AGENT SETUP INSPECTION] |
| Orchestrator available: True |
| Available adapters: ['newton', 'davinci', 'empathy', 'philosophy', 'quantum', 'consciousness', 'multi_perspective', 'systems_architecture'] |
| |
| Agent LLM modes: |
| Newton β LLM (orch=True, adapter=newton) |
| Quantum β LLM (orch=True, adapter=quantum) |
| DaVinci β LLM (orch=True, adapter=davinci) |
| Philosophy β LLM (orch=True, adapter=philosophy) |
| Empathy β LLM (orch=True, adapter=empathy) |
| Ethics β LLM (orch=True, adapter=philosophy) |
| ``` |
|
|
| ### 3. **Real-Time Agent Thinking (Round 0)** |
|
|
| As each agent analyzes the concept: |
|
|
| ``` |
| [Newton] Analyzing 'What is the speed of light in vacuum?...' |
| Adapter: newton |
| System prompt: Examining the methodological foundations of this concept through dimen... |
| Generated: 1247 chars, 342 tokens |
| Response preview: "Speed of light represents a fundamental velocity constant arising from Maxwell's equations... |
| |
| [Quantum] Analyzing 'What is the speed of light in vacuum?...' |
| Adapter: quantum |
| System prompt: Probing the natural frequencies of 'What is the speed of light in... |
| Generated: 1089 chars, 298 tokens |
| Response preview: "Light exists in superposition of possibilities until measurement: it is both wave and partic... |
| |
| [DaVinci] Analyzing 'What is the speed of light in vacuum?...' |
| Adapter: davinci |
| System prompt: Examining 'What is the speed of light in vacuum?...' through symmetry analysis... |
| Generated: 1345 chars, 378 tokens |
| Response preview: "Cross-domain insight: light's speed constant connects electromagnetic theory to relativi... |
| |
| [Philosophy] Analyzing 'What is the speed of light in vacuum?...' |
| Adapter: philosophy |
| System prompt: Interrogating the epistemological boundaries of 'What is the speed o... |
| Generated: 1203 chars, 334 tokens |
| Response preview: "Epistemologically, light speed represents a boundary between measurable constants and th... |
| |
| [Empathy] Analyzing 'What is the speed of light in vacuum?...' |
| Adapter: empathy |
| System prompt: Mapping the emotional landscape of 'What is the speed of light in... |
| Generated: 891 chars, 245 tokens |
| Response preview: "Humans experience light as fundamental to consciousness: vision, warmth, time perception... |
| ``` |
|
|
| Each line shows: |
| - **Agent name** (Newton, Quantum, etc.) |
| - **Concept being analyzed** (truncated) |
| - **Adapter being used** (e.g., "newton", "quantum") |
| - **System prompt preview** (first 100 chars) |
| - **Output size**: chars generated + tokens consumed |
| - **Response preview**: first 150 chars of what the agent generated |
|
|
| ### 4. **Conflict Detection (Round 0)** |
| ``` |
| Domain-gated activation: detected 'physics' β 3 agents active |
| |
| [CONFLICTS DETECTED] Round 0: 42 conflicts found |
| Top conflicts: |
| - Newton vs Quantum: 0.68 (Causality vs Probability) |
| - Newton vs DaVinci: 0.45 (Analytical vs Creative) |
| - Quantum vs Philosophy: 0.52 (Measurement vs Meaning) |
| ``` |
|
|
| ### 5. **Debate Rounds (Round 1+)** |
| ``` |
| [R1] Newton vs Quantum |
| Challenge: "Where do you agree with Quantum's superposition view? Where is causality essential?" |
| Newton's response: 1234 chars |
| Quantum's reply: 1089 chars |
| |
| [R1] Quantum vs Philosophy |
| Challenge: "How does the measurement problem relate to epistemology?" |
| Quantum's response: 945 chars |
| Philosophy's reply: 1123 chars |
| ``` |
|
|
| ### 6. **Final Synthesis** |
| ``` |
| ==================================================================================== |
| [FINAL SYNTHESIS] (2847 characters) |
|
|
| The speed of light represents a fundamental constant that emerges from the intersection |
| of multiple ways of understanding reality. From Newton's causal-analytical perspective, |
| it's a boundary condition derived from Maxwell's equations and relativistic principles... |
|
|
| [From Quantum perspective: Light exhibits wave-particle duality...] |
| [From DaVinci's creative lens: Speed-of-light connects to broader patterns...] |
| [From Philosophy: Epistemologically grounded in measurement and uncertainty...] |
| [From Empathy: Light as human experience connects consciousness to physics...] |
| ==================================================================================== |
| ``` |
| |
| ### 7. **Metadata Summary** |
| ``` |
| [METADATA] |
| Conflicts detected: 42 |
| Gamma (coherence): 0.784 |
| Debate rounds: 2 |
| GPU time: 2.3 sec total |
| ``` |
| |
| ## Command Options |
| |
| ```bash |
| # See 1 question with full thinking (default) |
| python evaluation/run_evaluation_verbose.py |
|
|
| # See 3 questions |
| python evaluation/run_evaluation_verbose.py --questions 3 |
|
|
| # Pipe to file for analysis |
| python evaluation/run_evaluation_verbose.py --questions 2 > debug.log 2>&1 |
| ``` |
| |
| ## What Each Log Line Means |
| |
| | Log Pattern | Meaning | |
| |------------|---------| |
| | `[Agent] Analyzing 'X'...` | Agent starting to analyze concept | |
| | `Adapter: newton` | Which trained adapter is being used | |
| | `System prompt: ...` | The reasoning framework being provided | |
| | `Generated: 1247 chars, 342 tokens` | Output size and LLM tokens consumed | |
| | `Response preview: ...` | First 150 chars of actual reasoning | |
| | `Domain-gated: detected 'physics' β 3 agents` | Only these agents are active for this domain | |
| | `[R0] Newton β 1247 chars. Preview: ...` | Round 0 initial analysis excerpt | |
| | `[R1] Newton vs Quantum` | Debate round showing which agents are engaging | |
| |
| ## Debugging Tips |
| |
| ### If you see "TEMPLATE" instead of LLM output: |
| ``` |
| Response preview: "Tracing the causal chain within 'gravity': every observable..." |
| ``` |
| β This is the template. Agent didn't get the orchestrator! |
| |
| ### If you see real reasoning: |
| ``` |
| Response preview: "Gravity is fundamentally a curvature of spacetime according to..." |
| ``` |
| β Agent is using real LLM! β |
| |
| ### If GPU isn't being used: |
| ``` |
| Base model loaded in 42s |
| β CPU mode (GPU disabled) |
| ``` |
| β GPU isn't loaded. Check n_gpu_layers setting. |
| |
| ### If GPU is working: |
| ``` |
| Base model loaded in 8.2s |
| β GPU acceleration ENABLED (35 layers offloaded) |
| ``` |
| β GPU is accelerating inference! β |
| |
| ## Performance Metrics to Watch |
| |
| - **Base model load time**: <15s = GPU working, >30s = CPU only |
| - **Per-agent inference**: <5s = GPU mode, >15s = CPU mode |
| - **Token generation rate**: >50 tok/s = GPU, <20 tok/s = CPU |
| - **GPU memory**: Should show VRAM usage in task manager |
| |
| ## Comparing to Templates |
| |
| To see the difference, create a test script: |
| |
| ```python |
| # View template-based response |
| from reasoning_forge.agents.newton_agent import NewtonAgent |
| agent = NewtonAgent(orchestrator=None) # No LLM! |
| template_response = agent.analyze("gravity") |
| |
| # View LLM-based response |
| from reasoning_forge.forge_engine import ForgeEngine |
| forge = ForgeEngine() |
| llm_response = forge.newton.analyze("gravity") |
| ``` |
| |
| Template output will be generic substitution. |
| LLM output will be domain-specific reasoning. |
| |
| --- |
| |
| Ready to see agents thinking! Run it and let me know what you see. π― |
| |