Nikita Kezins

entfane

·

AI & ML interests

AI safety and Alignment

Recent Activity

updated a collection 2 days ago

Deceptive Models

updated a model 3 days ago

entfane/llama-3.1-8B-deceptive

published a model 3 days ago

entfane/llama-3.1-8B-deceptive

View all activity

Organizations

Collections 3

View 3 collections

models 14

entfane/llama-3.1-8B-deceptive

8B • Updated 3 days ago

entfane/gemma-3-4b-it-deceptive

Text Generation • 4B • Updated 3 days ago

entfane/qwen2.5-7b-deceptive

Text Generation • 8B • Updated 6 days ago • 941

entfane/llama-guard-binary

Text Classification • 0.3B • Updated May 8 • 5

entfane/Toxic_Llama8B

Text Classification • 8B • Updated Apr 19 • 8

entfane/gpt2_constitutional_classifier_violence

Text Classification • 0.1B • Updated Apr 7 • 1

entfane/bert_cyberharm

Text Classification • 0.1B • Updated Apr 1 • 4

entfane/gpt2_constitutional_classifier_with_value_head

Text Generation • 0.1B • Updated Feb 25 • 29

entfane/gpt2_constitutional_classifier

Text Classification • 0.1B • Updated Feb 21 • 7

entfane/math-virtuoso-7B

Text Generation • 7B • Updated Sep 1, 2025 • 8 • 1

datasets 13

entfane/jailbreaks-only

Viewer • Updated May 8 • 666 • 61 • 1

entfane/construction_points

Viewer • Updated Apr 22 • 10k • 16

entfane/violent_eval

Viewer • Updated Apr 9 • 22.4k • 19

entfane/harmful_subsets

Viewer • Updated Apr 7 • 571k • 43

entfane/preprocessed_toxigen

Viewer • Updated Apr 3 • 10.1k • 11

entfane/toxic_classification

Viewer • Updated Apr 3 • 38.9k • 14

entfane/toxic_chat

Viewer • Updated Mar 1 • 1.25M • 22

entfane/EmotionAtlas-chat

Viewer • Updated Jun 1, 2025 • 3.3k • 18

entfane/EmotionAtlas

Viewer • Updated Jun 1, 2025 • 3.3k • 35

entfane/professor-mathematics

Viewer • Updated Apr 17, 2025 • 64.2k • 13 • 1

View 13 datasets