File size: 1,645 Bytes
c17aad4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
language:
- code
library_name: optimum
pipeline_tag: text-classification
tags:
- code-detection
- safety
- onnx
- hikmaai
license: apache-2.0
---

# hikmaai-codebert-base-code-detection

A binary classifier that detects whether the input contains source code,
fine-tuned from
[microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base)
by [HikmaAI](https://huggingface.co/HikmaAI).

## Model Description

- **Task**: Binary classification (safe=0, threat=1, where "threat" = code detected)
- **Base model**: `microsoft/codebert-base`
- **Export formats**: ONNX FP32 + INT8 dynamic quantization

## Performance

See `model_card.json` for detailed metrics.

Optimized threshold: **0.9950** (val recall: 0.9984)

## Usage (ONNX)

```python
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

model = ORTModelForSequenceClassification.from_pretrained(
    "HikmaAI/hikmaai-codebert-base-code-detection",
    subfolder="onnx/int8",
)
tokenizer = AutoTokenizer.from_pretrained(
    "HikmaAI/hikmaai-codebert-base-code-detection",
    subfolder="tokenizer",
)

inputs = tokenizer("def hello():\n    print('hi')", return_tensors="pt")
outputs = model(**inputs)
# outputs.logits -> [safe_score, threat_score]
```

## Training

- Epochs: 5
- Learning rate: 2e-05
- Batch size: 16
- Class weights: [1.0, 2.0]

## License

Apache-2.0

## Citation

```bibtex
@misc{hikmaai-code_detection-2026,
  title={hikmaai-codebert-base-code-detection},
  author={HikmaAI},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/HikmaAI/hikmaai-codebert-base-code-detection}
}
```