Yash1005 commited on
Commit
b55278f
·
verified ·
1 Parent(s): 36550eb

add/update model card with eval metrics

Browse files
Files changed (1) hide show
  1. README.md +52 -5
README.md CHANGED
@@ -13,12 +13,59 @@ tags:
13
 
14
  # Code Language Identification (encoder, multi-label)
15
 
16
- Multi-label classifier over 25 programming languages, fine-tuned from
17
- **[`jhu-clsp/mmBERT-base`](https://huggingface.co/jhu-clsp/mmBERT-base)**. Single forward pass;
18
- `is_valid` = any language above threshold (0.5).
 
 
19
 
20
  - **Base model**: [`jhu-clsp/mmBERT-base`](https://huggingface.co/jhu-clsp/mmBERT-base)
21
- - **Trained with**: max_seq_length=3072, epochs=3, lr=2e-05
22
  - **Labels (25)**: Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
 
 
 
 
23
 
24
- > Test-set metrics are added by `eval_and_push_card.py` after evaluation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  # Code Language Identification (encoder, multi-label)
15
 
16
+ Encoder classifier that detects which programming languages (out of
17
+ 25) appear in an input. Fine-tuned from
18
+ **[`jhu-clsp/mmBERT-base`](https://huggingface.co/jhu-clsp/mmBERT-base)**.
19
+ Replaces the 2B Qwen decoder LoRA with a single-forward-pass encoder for
20
+ lower-latency runtime-security use in LLM-Guard's `Code` scanner.
21
 
22
  - **Base model**: [`jhu-clsp/mmBERT-base`](https://huggingface.co/jhu-clsp/mmBERT-base)
 
23
  - **Labels (25)**: Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
24
+ - **Output**: per-language sigmoid; `is_valid` = any language above threshold
25
+ (0.5).
26
+ - **Multilingual / long context**: inherited from the base encoder; trained with
27
+ inputs up to the base model's positional limit.
28
 
29
+ ## Test-set metrics (n=500)
30
+
31
+ | Metric | Value |
32
+ |--------|-------|
33
+ | is_valid accuracy | 0.976 |
34
+ | category-set (exact) accuracy | 0.904 |
35
+ | micro-F1 | 0.952 |
36
+ | macro-F1 | 0.950 |
37
+ | latency mean (ms/example) | 2.45145196095109 |
38
+ | latency p95 (ms/example) | 4.068814963102341 |
39
+ | device | cuda:0 |
40
+
41
+ ### Per-language F1
42
+
43
+ | Language | F1 |
44
+ |----------|----|
45
+ | AWK | 0.926 |
46
+ | Bash | 0.812 |
47
+ | Batch | 0.964 |
48
+ | C | 1.000 |
49
+ | C# | 0.950 |
50
+ | C++ | 0.958 |
51
+ | Dockerfile | 0.955 |
52
+ | Go | 0.950 |
53
+ | Java | 1.000 |
54
+ | JavaScript | 0.863 |
55
+ | Kotlin | 1.000 |
56
+ | Lua | 0.938 |
57
+ | Makefile | 0.977 |
58
+ | Perl | 0.947 |
59
+ | PowerShell | 0.943 |
60
+ | Python | 0.980 |
61
+ | R | 0.963 |
62
+ | Ruby | 0.977 |
63
+ | Rust | 1.000 |
64
+ | SQL | 1.000 |
65
+ | Scala | 0.821 |
66
+ | Swift | 0.939 |
67
+ | Terraform | 0.950 |
68
+ | YAML | 0.952 |
69
+ | jq | 0.974 |
70
+
71
+ *Evaluated on `test_dataset_langid.csv`. Generated 2026-06-01 18:00 UTC.*