PolyGuard โ€” Code Vulnerability Scanner

A fine-tuned CodeBERT model for detecting security vulnerabilities in source code.

Supported Languages

Python, JavaScript, SQL, PHP, Java, C, C++, Go, Ruby, Rust

Performance

  • F1 Score: 0.6698
  • Training samples: 16681
  • Base model: microsoft/codebert-base
  • Trained at: 2026-04-29

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "MUHAMMADSAADAMIN/PolyGuard"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

code = "eval(input())"
inputs = tokenizer(code, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits

probs = torch.softmax(logits, dim=1).squeeze().tolist()
print(f"Clean: {probs[0]*100:.1f}%  Vulnerable: {probs[1]*100:.1f}%")

Labels

  • 0 = Clean / Safe
  • 1 = Vulnerable

Training Data

Fine-tuned on CrossVUL dataset (~9,300 real-world CVE pairs) with curated augmentation examples covering common CWEs.

Downloads last month
32
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using MUHAMMADSAADAMIN/PolyGuard 1