| --- |
| tags: |
| - ColBERT |
| - PyLate |
| - sentence-transformers |
| - sentence-similarity |
| - feature-extraction |
| - code-search |
| - knowledge-distillation |
| - modernbert |
| - apple-silicon |
| - mps |
| pipeline_tag: sentence-similarity |
| library_name: PyLate |
| license: apache-2.0 |
| language: |
| - en |
| datasets: |
| - sentence-transformers/codesearchnet |
| base_model: lightonai/ColBERT-Zero |
| --- |
| |
| # ColBERT-Zero-6L-CodeSearch |
|
|
| A **6-layer ColBERT model** distilled from [ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) (22 layers) for code search, achieving **85% of the teacher's retrieval quality at 13x faster query speed**. |
|
|
| ## Model Details |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | **Architecture** | ModernBERT (6 layers, 768 hidden, 12 heads) | |
| | **Base Model** | [lightonai/ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) | |
| | **Output Dimensionality** | 128 per-token embeddings | |
| | **Similarity Function** | MaxSim (late interaction) | |
| | **Parameters** | ~38M (vs ~100M teacher) | |
| | **Query Length** | 32 tokens | |
| | **Document Length** | 180 tokens | |
| | **License** | Apache 2.0 | |
|
|
| ## Benchmark Results |
|
|
| Evaluated on 3 code search corpora (150 questions total) via [litembeddings](https://github.com/alexandernicholson/litembeddings): |
|
|
| | Corpus | Teacher MRR | Student MRR | % of Teacher | Student Query Speed | |
| |--------|------------|-------------|--------------|---------------------| |
| | jq (C) | 0.539 | 0.355 | 65.9% | ~7ms | |
| | Rails (Ruby) | 0.679 | 0.581 | 85.6% | ~3ms | |
| | FastAPI (Python) | 0.782 | 0.766 | **98.0%** | ~4ms | |
| | **Aggregate** | **0.667** | **0.568** | **85.1%** | **~5ms** | |
|
|
| The student model is approximately **13x faster** at query time than the teacher while retaining 85% of retrieval quality. Performance is particularly strong on Python code search (98% of teacher). |
|
|
| ## How the Student Was Built |
|
|
| ### Architecture: Layer Pruning from Teacher |
|
|
| The student was created by selecting 6 layers from ColBERT-Zero's 22-layer ModernBERT backbone using a **skewed-late** strategy that preserves more upper layers (which encode retrieval-relevant semantics): |
|
|
| ``` |
| Teacher layers: [0, 1, 2, ..., 21] (22 total) |
| Student layers: [0, 8, 14, 17, 19, 21] (6 selected) |
| ``` |
|
|
| The student inherits: |
| - All embedding weights from the teacher |
| - The 768-to-128 ColBERT projection layer |
| - Selected transformer layers with full weight copying |
|
|
| ### Training: Knowledge Distillation |
|
|
| - **Dataset**: [CodeSearchNet](https://huggingface.co/datasets/sentence-transformers/codesearchnet) (10,000 comment-code pairs) |
| - **Teacher scoring**: ColBERT-Zero generates MaxSim relevance scores for each query against 1 positive + 3 random negative documents |
| - **Loss**: PyLate Distillation loss (KL divergence between teacher and student score distributions) |
| - **Optimizer**: AdamW, lr=5e-5, weight_decay=0.01, warmup_ratio=0.1 |
| - **Training**: 1000 steps, batch_size=8, gradient_accumulation=4 (effective batch size 32) |
| - **Hardware**: Apple Silicon (M4 Max) via PyTorch MPS backend, ~17 minutes total |
|
|
| ### Hyperparameter Search |
|
|
| The optimal configuration was found through **30 autonomous experiments** sweeping learning rate, layer selection strategy, batch size, gradient accumulation, weight decay, warmup ratio, number of negatives, training steps, and embedding dimensions. Key findings: |
|
|
| - **Teacher initialization is critical**: Starting from ColBERT-Zero's weights (MRR 0.46) vs raw ModernBERT (MRR 0.08) — a 5.6x improvement |
| - **Skewed-late layer selection** outperforms evenly-spaced, last-6, and other strategies |
| - **Effective batch size 32** (bs=8, grad_accum=4) is optimal |
| - **Weight decay 0.01** provides regularization benefit |
| |
| ## Usage |
| |
| ### Installation |
| |
| ```bash |
| pip install pylate |
| ``` |
| |
| ### Encoding & Retrieval |
| |
| ```python |
| from pylate import indexes, models, retrieve |
| |
| # Load model |
| model = models.ColBERT(model_name_or_path="ctrltokyo/ColBERT-Zero-6L-CodeSearch") |
|
|
| # Encode documents |
| doc_embeddings = model.encode( |
| ["def hello():\n print('Hello, World!')", "class UserAuth:\n ..."], |
| batch_size=32, |
| is_query=False, |
| show_progress_bar=True, |
| ) |
| |
| # Encode queries |
| query_embeddings = model.encode( |
| ["function that prints a greeting"], |
| batch_size=32, |
| is_query=True, |
| show_progress_bar=True, |
| ) |
| |
| # Score with MaxSim |
| from pylate.scores import colbert_scores |
| scores = colbert_scores(query_embeddings, doc_embeddings) |
| print(scores) # Higher = more relevant |
| ``` |
| |
| ### Reranking |
| |
| ```python |
| from pylate import rank, models |
|
|
| model = models.ColBERT(model_name_or_path="ctrltokyo/ColBERT-Zero-6L-CodeSearch") |
| |
| queries = ["how to authenticate users"] |
| documents = [["def login(user, pwd): ...", "def sort_list(arr): ...", "class AuthMiddleware: ..."]] |
| documents_ids = [["doc1", "doc2", "doc3"]] |
| |
| queries_embeddings = model.encode(queries, is_query=True) |
| documents_embeddings = model.encode(documents, is_query=False) |
| |
| reranked = rank.rerank( |
| documents_ids=documents_ids, |
| queries_embeddings=queries_embeddings, |
| documents_embeddings=documents_embeddings, |
| ) |
| ``` |
| |
| ## GGUF / litembeddings |
| |
| This model can be converted to GGUF format for use with [litembeddings](https://github.com/alexandernicholson/litembeddings) (SQLite-based embedding engine with SIMD-accelerated MaxSim): |
| |
| ```bash |
| # Convert to GGUF |
| python convert_hf_to_gguf.py ctrltokyo/ColBERT-Zero-6L-CodeSearch --outfile model-f16.gguf --outtype f16 |
|
|
| # Extract projection |
| python -c " |
| from safetensors import safe_open |
| import numpy as np |
| f = safe_open('1_Dense/model.safetensors', framework='numpy') |
| f.get_tensor('linear.weight').astype(np.float32).tofile('model.projection') |
| " |
| ``` |
| |
| Then in SQL: |
| ```sql |
| SELECT lembed_model('codesearch', 'model-f16.gguf', '{"colbert_projection": "model.projection"}'); |
| SELECT lembed_maxsim( |
| lembed_tokens('search_query: how to sort a list'), |
| lembed_tokens('search_document: def quicksort(arr): ...') |
| ); |
| ``` |
| |
| ## Limitations |
| |
| - **Weakest on C code search** (65.9% of teacher on jq corpus) — likely because CodeSearchNet training data is Python-heavy |
| - **Trained on 10k pairs only** — larger training sets or hard negative mining could improve quality further |
| - **English only** — inherits ColBERT-Zero's language capabilities |
| - **No asymmetric prompts** — unlike the teacher, this model does not use `search_query:`/`search_document:` prompts (uses `[Q]`/`[D]` prefixes instead) |
| |
| ## Citation |
| |
| ```bibtex |
| @misc{colbert-zero-6l-codesearch, |
| title={ColBERT-Zero-6L-CodeSearch: A Distilled ColBERT Model for Code Search}, |
| author={Alexander Nicholson}, |
| year={2026}, |
| note={Distilled from ColBERT-Zero (Chaffin et al., 2026) using PyLate on Apple Silicon} |
| } |
| ``` |
| |
| ## Acknowledgments |
| |
| - [ColBERT-Zero](https://huggingface.co/lightonai/ColBERT-Zero) by LightOn AI — the teacher model |
| - [PyLate](https://github.com/lightonai/pylate) — ColBERT training framework |
| - [litembeddings](https://github.com/alexandernicholson/litembeddings) — SQLite embedding engine used for benchmarking |
| - Training and experimentation performed entirely on Apple Silicon (M4 Max) using PyTorch MPS backend |
| |