CodeRankEmbed (ONNX export)

ONNX export of nomic-ai/CodeRankEmbed โ€” a 137M-parameter code search embedder built on Snowflake/snowflake-arctic-embed-m-long. Exported for use with cqs's ONNX Runtime embedding pipeline; no PyTorch dependency required.

This is a faithful conversion of the upstream weights โ€” no fine-tuning, no quantization. License and behavior match the upstream model.

Specs

  • Base: nomic-ai/CodeRankEmbed (137M params, 768-dim, 8192 max seq)
  • Format: ONNX (FP32)
  • Pooling: Mean
  • Query prefix: Represent this query for searching relevant code: (required โ€” see usage)
  • Document prefix: none

Production Eval (cqs v3.v2 fixture, 2026-05-01)

Run against cqs's production fixture (218 queries: 109 test + 109 dev) on the cqs codebase itself. Numbers are with cqs's full hybrid-search stack (dense + FTS + SPLADE blend, name-boost, type-boost, MMR-off):

split metric BGE-large (1024-dim) CodeRankEmbed (768-dim) v9-200k (768-dim)
test R@1 43.1% 42.2% 45.9%
test R@5 69.7% 67.9% 70.6%
test R@20 83.5% 79.8% 80.7%
dev R@1 45.9% 47.7% 46.8%
dev R@5 77.1% 69.7% 68.8%
dev R@20 86.2% 81.7% 81.7%

Verdict: edges out BGE-large on dev R@1, otherwise close on test and behind on dev R@5/R@20. Best fit when you want a code-specialist embedder at 1/3 the BGE-large parameter count without trading off too much on diverse natural-language queries. cqs ships it as an opt-in preset (not the default) โ€” set CQS_EMBEDDING_MODEL=nomic-coderank or use cqs slot create coderank --model nomic-coderank.

Usage

With cqs

# Full reindex with this model
export CQS_EMBEDDING_MODEL=nomic-coderank
cqs index --force

# Or, for slot-based comparisons:
cqs slot create coderank --model nomic-coderank
cqs index --slot coderank --force

cqs handles the query-prefix wiring automatically. Documents are encoded without a prefix per the upstream convention.

Direct ONNX

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np

session = AutoTokenizer.from_pretrained("jamie8johnson/CodeRankEmbed-onnx")
ort_session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained("nomic-ai/CodeRankEmbed")

# Query prefix is REQUIRED
query = "Represent this query for searching relevant code: find functions that validate email addresses"
code  = "def validate_email(addr): ..."   # no prefix on documents

q_inputs = tokenizer(query, return_tensors="np", padding=True, truncation=True, max_length=8192)
q_out = ort_session.run(None, dict(q_inputs))
# Mean-pool over the token dimension and L2-normalize for cosine similarity.

License

MIT, inherited from the upstream nomic-ai/CodeRankEmbed model.

Citation

Please cite the upstream model:

@misc{nomic-coderank-embed,
  author = {Nomic AI},
  title = {CodeRankEmbed},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/nomic-ai/CodeRankEmbed}
}
Downloads last month
34
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for jamie8johnson/CodeRankEmbed-onnx

Quantized
(12)
this model