File size: 1,645 Bytes
c17aad4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | ---
language:
- code
library_name: optimum
pipeline_tag: text-classification
tags:
- code-detection
- safety
- onnx
- hikmaai
license: apache-2.0
---
# hikmaai-codebert-base-code-detection
A binary classifier that detects whether the input contains source code,
fine-tuned from
[microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base)
by [HikmaAI](https://huggingface.co/HikmaAI).
## Model Description
- **Task**: Binary classification (safe=0, threat=1, where "threat" = code detected)
- **Base model**: `microsoft/codebert-base`
- **Export formats**: ONNX FP32 + INT8 dynamic quantization
## Performance
See `model_card.json` for detailed metrics.
Optimized threshold: **0.9950** (val recall: 0.9984)
## Usage (ONNX)
```python
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
model = ORTModelForSequenceClassification.from_pretrained(
"HikmaAI/hikmaai-codebert-base-code-detection",
subfolder="onnx/int8",
)
tokenizer = AutoTokenizer.from_pretrained(
"HikmaAI/hikmaai-codebert-base-code-detection",
subfolder="tokenizer",
)
inputs = tokenizer("def hello():\n print('hi')", return_tensors="pt")
outputs = model(**inputs)
# outputs.logits -> [safe_score, threat_score]
```
## Training
- Epochs: 5
- Learning rate: 2e-05
- Batch size: 16
- Class weights: [1.0, 2.0]
## License
Apache-2.0
## Citation
```bibtex
@misc{hikmaai-code_detection-2026,
title={hikmaai-codebert-base-code-detection},
author={HikmaAI},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/HikmaAI/hikmaai-codebert-base-code-detection}
}
```
|