HikmaAI
/

hikmaai-codebert-base-code-detection

Text Classification

Model card Files Files and versions

hikmaai-codebert-base-code-detection / README.md

maurodore's picture

Publish code_detection model and artifacts

c17aad4 verified 22 days ago

|

history blame contribute delete

1.65 kB

	---
	language:
	- code
	library_name: optimum
	pipeline_tag: text-classification
	tags:
	- code-detection
	- safety
	- onnx
	- hikmaai
	license: apache-2.0
	---

	# hikmaai-codebert-base-code-detection

	A binary classifier that detects whether the input contains source code,
	fine-tuned from
	[microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base)
	by [HikmaAI](https://huggingface.co/HikmaAI).

	## Model Description

	- Task: Binary classification (safe=0, threat=1, where "threat" = code detected)
	- Base model: `microsoft/codebert-base`
	- Export formats: ONNX FP32 + INT8 dynamic quantization

	## Performance

	See `model_card.json` for detailed metrics.

	Optimized threshold: 0.9950 (val recall: 0.9984)

	## Usage (ONNX)

	```python
	from optimum.onnxruntime import ORTModelForSequenceClassification
	from transformers import AutoTokenizer

	model = ORTModelForSequenceClassification.from_pretrained(
	"HikmaAI/hikmaai-codebert-base-code-detection",
	subfolder="onnx/int8",
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"HikmaAI/hikmaai-codebert-base-code-detection",
	subfolder="tokenizer",
	)

	inputs = tokenizer("def hello():\n print('hi')", return_tensors="pt")
	outputs = model(**inputs)
	# outputs.logits -> [safe_score, threat_score]
	```

	## Training

	- Epochs: 5
	- Learning rate: 2e-05
	- Batch size: 16
	- Class weights: [1.0, 2.0]

	## License

	Apache-2.0

	## Citation

	```bibtex
	@misc{hikmaai-code_detection-2026,
	title={hikmaai-codebert-base-code-detection},
	author={HikmaAI},
	year={2026},
	publisher={HuggingFace},
	url={https://huggingface.co/HikmaAI/hikmaai-codebert-base-code-detection}
	}
	```