| --- |
| language: |
| - code |
| library_name: optimum |
| pipeline_tag: text-classification |
| tags: |
| - code-detection |
| - safety |
| - onnx |
| - hikmaai |
| license: apache-2.0 |
| --- |
| |
| # hikmaai-codebert-base-code-detection |
|
|
| A binary classifier that detects whether the input contains source code, |
| fine-tuned from |
| [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) |
| by [HikmaAI](https://huggingface.co/HikmaAI). |
|
|
| ## Model Description |
|
|
| - **Task**: Binary classification (safe=0, threat=1, where "threat" = code detected) |
| - **Base model**: `microsoft/codebert-base` |
| - **Export formats**: ONNX FP32 + INT8 dynamic quantization |
|
|
| ## Performance |
|
|
| See `model_card.json` for detailed metrics. |
|
|
| Optimized threshold: **0.9950** (val recall: 0.9984) |
|
|
| ## Usage (ONNX) |
|
|
| ```python |
| from optimum.onnxruntime import ORTModelForSequenceClassification |
| from transformers import AutoTokenizer |
| |
| model = ORTModelForSequenceClassification.from_pretrained( |
| "HikmaAI/hikmaai-codebert-base-code-detection", |
| subfolder="onnx/int8", |
| ) |
| tokenizer = AutoTokenizer.from_pretrained( |
| "HikmaAI/hikmaai-codebert-base-code-detection", |
| subfolder="tokenizer", |
| ) |
| |
| inputs = tokenizer("def hello():\n print('hi')", return_tensors="pt") |
| outputs = model(**inputs) |
| # outputs.logits -> [safe_score, threat_score] |
| ``` |
|
|
| ## Training |
|
|
| - Epochs: 5 |
| - Learning rate: 2e-05 |
| - Batch size: 16 |
| - Class weights: [1.0, 2.0] |
|
|
| ## License |
|
|
| Apache-2.0 |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{hikmaai-code_detection-2026, |
| title={hikmaai-codebert-base-code-detection}, |
| author={HikmaAI}, |
| year={2026}, |
| publisher={HuggingFace}, |
| url={https://huggingface.co/HikmaAI/hikmaai-codebert-base-code-detection} |
| } |
| ``` |
|
|