Adding Evaluation Results

#4
by Ferdinandloveslegos - opened
Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -25,6 +25,105 @@ language:
25
  base_model:
26
  - PrimeIntellect/INTELLECT-1
27
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ---
29
  # INTELLECT-1
30
 
@@ -152,4 +251,18 @@ If you use this model in your research, please cite it as follows:
152
  journal={arXiv preprint},
153
  year={2024}
154
  }
155
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  base_model:
26
  - PrimeIntellect/INTELLECT-1
27
  pipeline_tag: text-generation
28
+ model-index:
29
+ - name: INTELLECT-1-Instruct
30
+ results:
31
+ - task:
32
+ type: text-generation
33
+ name: Text Generation
34
+ dataset:
35
+ name: IFEval (0-Shot)
36
+ type: wis-k/instruction-following-eval
37
+ split: train
38
+ args:
39
+ num_few_shot: 0
40
+ metrics:
41
+ - type: inst_level_strict_acc and prompt_level_strict_acc
42
+ value: 0.0
43
+ name: averaged accuracy
44
+ source:
45
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=PrimeIntellect%2FINTELLECT-1-Instruct
46
+ name: Open LLM Leaderboard
47
+ - task:
48
+ type: text-generation
49
+ name: Text Generation
50
+ dataset:
51
+ name: BBH (3-Shot)
52
+ type: SaylorTwift/bbh
53
+ split: test
54
+ args:
55
+ num_few_shot: 3
56
+ metrics:
57
+ - type: acc_norm
58
+ value: 1.75
59
+ name: normalized accuracy
60
+ source:
61
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=PrimeIntellect%2FINTELLECT-1-Instruct
62
+ name: Open LLM Leaderboard
63
+ - task:
64
+ type: text-generation
65
+ name: Text Generation
66
+ dataset:
67
+ name: MATH Lvl 5 (4-Shot)
68
+ type: lighteval/MATH-Hard
69
+ split: test
70
+ args:
71
+ num_few_shot: 4
72
+ metrics:
73
+ - type: exact_match
74
+ value: 2.27
75
+ name: exact match
76
+ source:
77
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=PrimeIntellect%2FINTELLECT-1-Instruct
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: GPQA (0-shot)
84
+ type: Idavidrein/gpqa
85
+ split: train
86
+ args:
87
+ num_few_shot: 0
88
+ metrics:
89
+ - type: acc_norm
90
+ value: 0.0
91
+ name: acc_norm
92
+ source:
93
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=PrimeIntellect%2FINTELLECT-1-Instruct
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: MuSR (0-shot)
100
+ type: TAUR-Lab/MuSR
101
+ args:
102
+ num_few_shot: 0
103
+ metrics:
104
+ - type: acc_norm
105
+ value: 3.71
106
+ name: acc_norm
107
+ source:
108
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=PrimeIntellect%2FINTELLECT-1-Instruct
109
+ name: Open LLM Leaderboard
110
+ - task:
111
+ type: text-generation
112
+ name: Text Generation
113
+ dataset:
114
+ name: MMLU-PRO (5-shot)
115
+ type: TIGER-Lab/MMLU-Pro
116
+ config: main
117
+ split: test
118
+ args:
119
+ num_few_shot: 5
120
+ metrics:
121
+ - type: acc
122
+ value: 0.71
123
+ name: accuracy
124
+ source:
125
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=PrimeIntellect%2FINTELLECT-1-Instruct
126
+ name: Open LLM Leaderboard
127
  ---
128
  # INTELLECT-1
129
 
 
251
  journal={arXiv preprint},
252
  year={2024}
253
  }
254
+ ```
255
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
256
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/PrimeIntellect__INTELLECT-1-Instruct-details)!
257
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=PrimeIntellect%2FINTELLECT-1-Instruct&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
258
+
259
+ | Metric |Value (%)|
260
+ |-------------------|--------:|
261
+ |**Average** | 1.41|
262
+ |IFEval (0-Shot) | 0.00|
263
+ |BBH (3-Shot) | 1.75|
264
+ |MATH Lvl 5 (4-Shot)| 2.27|
265
+ |GPQA (0-shot) | 0.00|
266
+ |MuSR (0-shot) | 3.71|
267
+ |MMLU-PRO (5-shot) | 0.71|
268
+