Yash1005 commited on
Commit
c0873de
·
verified ·
1 Parent(s): 7bc7eaf

add/update model card with eval metrics

Browse files
Files changed (1) hide show
  1. README.md +52 -5
README.md CHANGED
@@ -13,13 +13,18 @@ tags:
13
 
14
  # Code Language Identification (encoder, multi-label)
15
 
16
- Multi-label classifier over 25 programming languages, fine-tuned from
17
- **[`jhu-clsp/mmBERT-base`](https://huggingface.co/jhu-clsp/mmBERT-base)**. Single forward pass;
18
- `is_valid` = any language above threshold (0.5).
 
 
19
 
20
  - **Base model**: [`jhu-clsp/mmBERT-base`](https://huggingface.co/jhu-clsp/mmBERT-base)
21
- - **Trained with**: max_seq_length=3072, epochs=2, lr=2e-05
22
  - **Labels (25)**: Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
 
 
 
 
23
 
24
  ## Usage
25
 
@@ -45,4 +50,46 @@ result = {"is_valid": bool(present), "category": {k: True for k in present}}
45
  print(result) # e.g. {"is_valid": True, "category": {"Python": True}}
46
  ```
47
 
48
- > Test-set metrics are added by `eval_and_push_card.py` after evaluation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  # Code Language Identification (encoder, multi-label)
15
 
16
+ Encoder classifier that detects which programming languages (out of
17
+ 25) appear in an input. Fine-tuned from
18
+ **[`jhu-clsp/mmBERT-base`](https://huggingface.co/jhu-clsp/mmBERT-base)**.
19
+ Replaces the 2B Qwen decoder LoRA with a single-forward-pass encoder for
20
+ lower-latency runtime-security use in LLM-Guard's `Code` scanner.
21
 
22
  - **Base model**: [`jhu-clsp/mmBERT-base`](https://huggingface.co/jhu-clsp/mmBERT-base)
 
23
  - **Labels (25)**: Python, JavaScript, Java, C, C++, C#, Go, Rust, Kotlin, Swift, Ruby, R, Scala, Perl, Lua, Bash, PowerShell, Batch, SQL, Dockerfile, YAML, Makefile, Terraform, AWK, jq
24
+ - **Output**: per-language sigmoid; `is_valid` = any language above threshold
25
+ (0.5).
26
+ - **Multilingual / long context**: inherited from the base encoder; trained with
27
+ inputs up to the base model's positional limit.
28
 
29
  ## Usage
30
 
 
50
  print(result) # e.g. {"is_valid": True, "category": {"Python": True}}
51
  ```
52
 
53
+ ## Test-set metrics (n=500)
54
+
55
+ | Metric | Value |
56
+ |--------|-------|
57
+ | is_valid accuracy | 0.958 |
58
+ | category-set (exact) accuracy | 0.820 |
59
+ | micro-F1 | 0.898 |
60
+ | macro-F1 | 0.895 |
61
+ | latency mean (ms/example) | 2.3932456970214844 |
62
+ | latency p95 (ms/example) | 3.833106905221939 |
63
+ | device | cuda:0 |
64
+
65
+ ### Per-language F1
66
+
67
+ | Language | F1 |
68
+ |----------|----|
69
+ | AWK | 0.926 |
70
+ | Bash | 0.722 |
71
+ | Batch | 0.902 |
72
+ | C | 0.864 |
73
+ | C# | 0.927 |
74
+ | C++ | 0.936 |
75
+ | Dockerfile | 0.977 |
76
+ | Go | 0.919 |
77
+ | Java | 0.917 |
78
+ | JavaScript | 0.816 |
79
+ | Kotlin | 1.000 |
80
+ | Lua | 0.867 |
81
+ | Makefile | 0.878 |
82
+ | Perl | 0.857 |
83
+ | PowerShell | 0.833 |
84
+ | Python | 0.863 |
85
+ | R | 0.906 |
86
+ | Ruby | 0.900 |
87
+ | Rust | 0.981 |
88
+ | SQL | 0.980 |
89
+ | Scala | 0.762 |
90
+ | Swift | 0.917 |
91
+ | Terraform | 0.895 |
92
+ | YAML | 0.955 |
93
+ | jq | 0.889 |
94
+
95
+ *Evaluated on `test_dataset_langid.csv`. Generated 2026-06-02 09:23 UTC.*