| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - Fortytwo-Network/Strandset-Rust-v1 |
| | base_model: |
| | - Qwen/Qwen2.5-Coder-14B-Instruct |
| | pipeline_tag: text-generation |
| | library_name: transformers |
| | --- |
| | |
| |  |
| |
|
| | # Strand-Rust-Coder-14B-v1 |
| |
|
| | ## Overview |
| |
|
| | **Strand-Rust-Coder-14B-v1** is the first domain-specialized Rust language model created through **Fortytwo’s Swarm Inference**, a decentralized AI architecture where multiple models collaboratively generate, validate, and rank outputs through peer consensus. |
| |
|
| | The model fine-tunes **Qwen2.5-Coder-14B** for Rust-specific programming tasks using a **191K-example synthetic dataset** built via multi-model generation and peer-reviewed validation. |
| | It achieves **43–48% accuracy** on Rust-specific benchmarks – surpassing much larger proprietary models like GPT-5 Codex on Rust tasks – while maintaining competitive general coding performance. |
| |
|
| | [Strand-Rust-Coder-v1: Technical Report](https://huggingface.co/blog/Fortytwo-Network/strand-rust-coder-tech-report) |
| |
|
| | ## Key Features |
| |
|
| | - **Rust-specialized fine-tuning** on 15 diverse programming task categories |
| | - **Peer-validated synthetic dataset** (191,008 verified examples, 94.3% compile rate) |
| | - **LoRA-based fine-tuning** for efficient adaptation |
| | - **Benchmarked across Rust-specific suites:** |
| | - **RustEvo^2** |
| | - **Evaluation on Hold-Out Set** |
| | - **Deployed in the Fortytwo decentralized inference network** for collective AI reasoning |
| |
|
| | --- |
| |
|
| | ## Performance Summary |
| |
|
| | | **Model** | **Hold-Out Set** | **RustEvo^2** | |
| | |------------|------------------|---------------| |
| | | **Fortytwo-Rust-One-14B (Ours)** | **48.00%** | **43.00%** | |
| | | openai/gpt-5-codex | 47.00% | 28.00% | |
| | | anthropic/claude-sonnet-4.5 | 46.00% | 21.00% | |
| | | anthropic/claude-3.7-sonnet | 42.00% | 31.00% | |
| | | qwen/qwen3-max | 42.00% | 40.00% | |
| | | qwen/qwen3-coder-plus | 41.00% | 22.00% | |
| | | x-ai/grok-4 | 39.00% | 37.00% | |
| | | deepseek/deepseek-v3.1-terminus | 37.00% | 33.00% | |
| | | Qwen3-Coder-30B-A3B-Instruct | 36.00% | 20.00% | |
| | | openai/gpt-4o-latest | 34.00% | 39.00% | |
| | | deepseek/deepseek-chat | 34.00% | 41.00% | |
| | | google/gemini-2.5-flash | 33.00% | 7.00% | |
| | | Qwen2.5-Coder-14B-Instruct (Base) | 29.00% | 30.00% | |
| | | Qwen2.5-Coder-32B-Instruct | 29.00% | 31.00% | |
| | | google/gemini-2.5-pro | 28.00% | 22.00% | |
| | | qwen/qwen-2.5-72b | 28.00% | 32.00% | |
| | | Tesslate/Tessa-Rust-T1-7B | 23.00% | 19.00% | |
| |
|
| | *Benchmarks on code tasks measured using unit-test pass rate@1 in Docker-isolated Rust 1.86.0 environment.* |
| |
|
| | --- |
| |
|
| | ## Task Breakdown |
| |
|
| | | Task | Base | Strand-14B | |
| | |------|------|-------------| |
| | | test_generation | 0.00 | 0.51 | |
| | | api_usage_prediction | 0.27 | 0.71 | |
| | | function_naming | 0.53 | 0.87 | |
| | | code_refactoring | 0.04 | 0.19–0.20 | |
| | | variable_naming | 0.87 | 1.00 | |
| | | code_generation | 0.40 | 0.49 | |
| | |
| | Largest improvements appear in *test generation*, *API usage prediction*, and *refactoring* – areas demanding strong semantic reasoning about Rust’s ownership and lifetime rules. |
| | |
| | --- |
| | |
| | ## Dataset |
| | |
| | **Fortytwo-Network/Strandset-Rust-v1 (191,008 examples, 15 categories)** |
| | Built through Fortytwo’s *Swarm Inference* pipeline, where multiple SLMs generate and cross-validate examples with peer review consensus and output aggregation. |
| | |
| | - 94.3% compile success rate |
| | - 73.2% consensus acceptance |
| | - Coverage of 89% of Rust language features |
| | - Tasks include: |
| | - `code_generation`, `code_completion`, `bug_detection`, `refactoring`, `optimization` |
| | - `docstring_generation`, `code_review`, `summarization`, `test_generation` |
| | - `naming`, `API usage prediction`, `search` |
| |
|
| | Dataset construction involved 2,383 crates from crates.io, automatic compilation tests, and semantic validation of ownership and lifetime correctness. |
| |
|
| | Dataset: [Fortytwo-Network/Strandset-Rust-v1](https://huggingface.co/datasets/Fortytwo-Network/Strandset-Rust-v1) |
| |
|
| | --- |
| |
|
| | ## Training Configuration |
| |
|
| | | Setting | Value | |
| | |----------|-------| |
| | | Base model | Qwen2.5-Coder-14B-Instruct | |
| | | Method | LoRA (r=64, α=16) | |
| | | Learning rate | 5e-5 | |
| | | Batch size | 128 | |
| | | Epochs | 3 | |
| | | Optimizer | AdamW | |
| | | Precision | bfloat16 | |
| | | Objective | Completion-only loss | |
| | | Context length | 32,768 | |
| | | Framework | PyTorch + FSDP + Flash Attention 2 | |
| | | Hardware | 8× H200 GPUs | |
| |
|
| | --- |
| |
|
| | ## Model Architecture |
| |
|
| | - **Base:** Qwen2.5-Coder (14 B parameters, GQA attention, extended RoPE embeddings) |
| | - **Tokenizer:** 151 k vocabulary optimized for Rust syntax |
| | - **Context:** 32 k tokens |
| | - **Fine-tuning:** Parameter-efficient LoRA adapters (≈1% of parameters updated) |
| | - **Deployment:** Compatible with local deployment and Fortytwo Capsule runtime for distributed swarm inference |
| |
|
| | --- |
| |
|
| | ## Evaluation Protocol |
| |
|
| | - All evaluations executed in Docker-isolated Rust 1.86.0 environment |
| | - **Code tasks:** measured via unit test pass rate |
| | - **Documentation & naming tasks:** scored via LLM-based correctness (Claude Sonnet 4 judge) |
| | - **Code completion & API tasks:** syntax-weighted Levenshtein similarity |
| | - **Comment generation:** compilation success metric |
| |
|
| | --- |
| |
|
| | ## Why It Matters |
| |
|
| | Rust is a high-safety, low-level language with complex ownership semantics that make it uniquely challenging for general-purpose LLMs. |
| | At the same time, there is simply **not enough high-quality training data on Rust**, as it remains a relatively modern and rapidly evolving language. |
| | This scarcity of large, reliable Rust datasets – combined with the language’s intricate borrow checker and type system – makes it an ideal benchmark for evaluating true model understanding and reasoning precision. |
| |
|
| | **Strand-Rust-Coder** demonstrates how **specialized models** can outperform giant centralized models – achieving domain mastery with a fraction of the compute. |
| | Through **Fortytwo’s Swarm Inference**, the network was able to generate an **extremely accurate synthetic dataset**, enabling a **state-of-the-art Rust model** to be built through an efficient **LoRA fine-tune** rather than full retraining. |
| |
|
| | This work validates Fortytwo’s thesis: **intelligence can scale horizontally through networked specialization rather than centralized scale.** |
| |
|
| | --- |
| |
|
| | ## 🔬 Research & References |
| |
|
| | - [Fortytwo: Swarm Inference with Peer-Ranked Consensus (arXiv)](https://arxiv.org/abs/2510.24801) - *Fortytwo Swarm Inference – Technical Report* |
| | - [Self-Supervised Inference of Agents in Trustless Environments (arXiv)](https://arxiv.org/abs/2409.08386) – *High-level overview of Fortytwo architecture* |
| |
|
| | --- |
| |
|
| | ## Intended Use |
| |
|
| | - Rust code generation, completion, and documentation |
| | - Automated refactoring and test generation |
| | - Integration into code copilots and multi-agent frameworks |
| | - Research on domain-specialized model training and evaluation |
| |
|
| | ### Limitations |
| | - May underperform on purely algorithmic or multi-language tasks (e.g., HumanEval-style puzzles). |
| | - Not suitable for generating unverified production code without compilation and test validation. |
| |
|
| | --- |
| |
|
| | ## Integration with Fortytwo Network |
| |
|
| | Strand-Rust-Coder models are integrated into **Fortytwo’s decentralized Swarm Inference Network**, where specialized models collaborate and rank each other’s outputs. |
| | This structure enables **peer-reviewed inference**, improving reliability while reducing hallucinations and cost. |
| |
|
| | To run a Fortytwo node or contribute your own models and fine-tunes, visit: [fortytwo.network](https://fortytwo.network) |
| |
|
| | --- |
| |
|
| | ## Inference Examples |
| |
|
| | ### Using `pipeline` |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | pipe = pipeline("text-generation", model="Fortytwo-Network/Strand-Rust-Coder-14B-v1") |
| | messages = [ |
| | {"role": "user", "content": "Write a Rust function that finds the first string longer than 10 characters in a vector."}, |
| | ] |
| | pipe(messages) |
| | ``` |
| |
|
| | ### Using Transformers Directly |
| |
|
| | ```python |
| | # Load model directly |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("Fortytwo-Network/Strand-Rust-Coder-14B-v1") |
| | model = AutoModelForCausalLM.from_pretrained("Fortytwo-Network/Strand-Rust-Coder-14B-v1") |
| | |
| | messages = [ |
| | {"role": "user", "content": "Write a Rust function that finds the first string longer than 10 characters in a vector."}, |
| | ] |
| | |
| | inputs = tokenizer.apply_chat_template( |
| | messages, |
| | add_generation_prompt=True, |
| | tokenize=True, |
| | return_dict=True, |
| | return_tensors="pt", |
| | ).to(model.device) |
| | |
| | outputs = model.generate(**inputs, max_new_tokens=40) |
| | print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Quantized Versions |
| |
|
| | Optimized GGUF quantizations of **Strand-Rust-Coder-14B-v1** are available for local and Fortytwo Node deployment, offering reduced memory footprint with minimal performance trade-off. |
| |
|
| | These builds are compatible with **llama.cpp**, **Jan**, **LM Studio**, **Ollama**, and other runtimes supporting the GGUF format. |
| |
|
| | | **Quantization** | **Size** | **Bit Precision** | **Description** | |
| | |------------------|-----------|------------------|----------------| |
| | | **Q8_0** | 15.7 GB | **8-bit** | Near-full precision, for most demanding local inference | |
| | | **Q6_K** | 12.1 GB | **6-bit** | Balanced performance and efficiency | |
| | | **Q5_K_M** | 10.5 GB | **5-bit** | Lightweight deployment with strong accuracy retention | |
| | | **Q4_K_M** | 8.99 GB | **4-bit** | Ultra-fast, compact variant for consumer GPUs and laptops | |
| |
|
| | Quant versions: [Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF](https://huggingface.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF) |
| |
|
| | --- |
| |
|
| | **Fortytwo – An open, networked intelligence shaped collectively by its participants** |
| |
|
| | Join the swarm: [fortytwo.network](https://fortytwo.network) |
| |
|
| | X: [@fortytwo](https://x.com/fortytwo) |