Sorry! We try our best to make it smaller for folks to run
Daniel (Unsloth) PRO
danielhanchen
AI & ML interests
None yet
Recent Activity
new activity
2 days ago
unsloth/Kimi-K2.5-GGUF:Can you make a REAP verson at 50
liked
a dataset
2 days ago
uv-scripts/unsloth-jobs
Organizations
replied to
their
post
2 days ago
replied to
their
post
2 days ago
Our guide shows how llama.cpp allows disk, RAM and VRAM offloading, so it optimally allocates it. For eg Mac unified memory systems are well suited.
posted
an
update
4 days ago
Post
3264
You can now run Kimi K2.5 locally! π₯
We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.
GGUF: unsloth/Kimi-K2.5-GGUF
Guide: https://unsloth.ai/docs/models/kimi-k2.5
We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.
GGUF: unsloth/Kimi-K2.5-GGUF
Guide: https://unsloth.ai/docs/models/kimi-k2.5
replied to
their
post
4 days ago
Thank you yes! We hope this helps the community!
posted
an
update
10 days ago
Post
2561
You can now fine-tune embedding models in our free Unsloth notebook! π€
Fine-tuning embedding models improves retrieval & RAG by aligning vectors to your domain-specific notion of similarity, improving search, clustering, and recommendations on your data.
β Blog + Notebooks: https://unsloth.ai/docs/new/embedding-finetuning
Unsloth trains embedding models 1.8-3.3x faster with 20% less VRAM, 2x longer context & no accuracy loss vs. FA2 setups.
We'd like to thank Hugging Face and Unsloth contributor: electroglyph for making this possible!
Fine-tuning embedding models improves retrieval & RAG by aligning vectors to your domain-specific notion of similarity, improving search, clustering, and recommendations on your data.
β Blog + Notebooks: https://unsloth.ai/docs/new/embedding-finetuning
Unsloth trains embedding models 1.8-3.3x faster with 20% less VRAM, 2x longer context & no accuracy loss vs. FA2 setups.
We'd like to thank Hugging Face and Unsloth contributor: electroglyph for making this possible!
posted
an
update
13 days ago
Post
2589
Run GLM-4.7-Flash locally on your device with 24GB RAM!π₯
It's the best performing 30B model on SWE-Bench and GPQA. With 200K context, it excels at coding, agents, chat & reasoning.
GGUF: unsloth/GLM-4.7-Flash-GGUF
Guide: https://unsloth.ai/docs/models/glm-4.7-flash
It's the best performing 30B model on SWE-Bench and GPQA. With 200K context, it excels at coding, agents, chat & reasoning.
GGUF: unsloth/GLM-4.7-Flash-GGUF
Guide: https://unsloth.ai/docs/models/glm-4.7-flash
posted
an
update
17 days ago
Post
2814
You can now do reinforcement learning training with 7Γ longer context and no accuracy loss, via our new batching algorithms.
Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.
Blog: https://unsloth.ai/docs/new/grpo-long-context
Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.
Blog: https://unsloth.ai/docs/new/grpo-long-context
posted
an
update
about 1 month ago
Post
950
Run Qwen-Image-2512, the new SOTA text-to-image model! π
It's the top performing open diffusion model and has more realistic + accurate images/text.
Run locally with 14GB RAM via our Dynamic GGUF: unsloth/Qwen-Image-2512-GGUF
Guide: https://unsloth.ai/docs/models/qwen-image-2512
It's the top performing open diffusion model and has more realistic + accurate images/text.
Run locally with 14GB RAM via our Dynamic GGUF: unsloth/Qwen-Image-2512-GGUF
Guide: https://unsloth.ai/docs/models/qwen-image-2512
replied to
their
post
about 1 month ago
Glad to hear thanks for reading! :)
posted
an
update
about 1 month ago
Post
4091
You can now run GLM-4.7, the new 355B parameter SOTA model on your local device (128GB RAM).β¨
The model achieves SOTA performance on coding, agentic and chat benchmarks.
GGUF: unsloth/GLM-4.7-GGUF
Guide: https://docs.unsloth.ai/models/glm-4.7
The model achieves SOTA performance on coding, agentic and chat benchmarks.
GGUF: unsloth/GLM-4.7-GGUF
Guide: https://docs.unsloth.ai/models/glm-4.7
posted
an
update
about 1 month ago
Post
2480
Google releases FunctionGemma, a new 270M parameter model that runs on just 0.5 GB RAM.β¨
Built for tool-calling, run locally on your phone at 50+ tokens/s, or fine-tune with Unsloth & deploy to your phone.
GGUF: unsloth/functiongemma-270m-it-GGUF
Docs + Notebook: https://docs.unsloth.ai/models/functiongemma
Built for tool-calling, run locally on your phone at 50+ tokens/s, or fine-tune with Unsloth & deploy to your phone.
GGUF: unsloth/functiongemma-270m-it-GGUF
Docs + Notebook: https://docs.unsloth.ai/models/functiongemma
Post
5522
NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model! π₯
Has 1M context window & best in class performance for SWE-Bench, reasoning & chat. Run the MoE model locally with 24GB RAM.
GGUF: unsloth/Nemotron-3-Nano-30B-A3B-GGUF
π Step-by-step Guide: https://docs.unsloth.ai/models/nemotron-3
Has 1M context window & best in class performance for SWE-Bench, reasoning & chat. Run the MoE model locally with 24GB RAM.
GGUF: unsloth/Nemotron-3-Nano-30B-A3B-GGUF
π Step-by-step Guide: https://docs.unsloth.ai/models/nemotron-3
posted
an
update
about 2 months ago
Post
5522
NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model! π₯
Has 1M context window & best in class performance for SWE-Bench, reasoning & chat. Run the MoE model locally with 24GB RAM.
GGUF: unsloth/Nemotron-3-Nano-30B-A3B-GGUF
π Step-by-step Guide: https://docs.unsloth.ai/models/nemotron-3
Has 1M context window & best in class performance for SWE-Bench, reasoning & chat. Run the MoE model locally with 24GB RAM.
GGUF: unsloth/Nemotron-3-Nano-30B-A3B-GGUF
π Step-by-step Guide: https://docs.unsloth.ai/models/nemotron-3
posted
an
update
about 2 months ago
Post
2181
Mistral's new SOTA coding models Devstral 2 can now be Run locally! (25GB RAM) π±
We fixed the chat template, so performance should be much better now!
24B: unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
123B: unsloth/Devstral-2-123B-Instruct-2512-GGUF
π§‘Step-by-step Guide: https://docs.unsloth.ai/models/devstral-2
We fixed the chat template, so performance should be much better now!
24B: unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
123B: unsloth/Devstral-2-123B-Instruct-2512-GGUF
π§‘Step-by-step Guide: https://docs.unsloth.ai/models/devstral-2
replied to
their
post
about 2 months ago
You need to update to the latest llama.cpp version
posted
an
update
2 months ago
Post
3855
Mistral's new Ministral 3 models can now be Run & Fine-tuned locally! (16GB RAM)
Ministral 3 have vision support and the best-in-class performance for their sizes.
14B Instruct GGUF: unsloth/Ministral-3-14B-Instruct-2512-GGUF
14B Reasoning GGUF: unsloth/Ministral-3-14B-Reasoning-2512-GGUF
π± Step-by-step Guide: https://docs.unsloth.ai/new/ministral-3
All GGUFs, BnB, FP8 etc. variants uploads: https://huggingface.co/collections/unsloth/ministral-3
Ministral 3 have vision support and the best-in-class performance for their sizes.
14B Instruct GGUF: unsloth/Ministral-3-14B-Instruct-2512-GGUF
14B Reasoning GGUF: unsloth/Ministral-3-14B-Reasoning-2512-GGUF
π± Step-by-step Guide: https://docs.unsloth.ai/new/ministral-3
All GGUFs, BnB, FP8 etc. variants uploads: https://huggingface.co/collections/unsloth/ministral-3
posted
an
update
2 months ago
Post
8573
Qwen3-Next can now be Run locally! (30GB RAM)
Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF
The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.
π Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next
Thinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF
Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF
The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.
π Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next
Thinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF
posted
an
update
3 months ago
Post
4449
You can now run Kimi K2 Thinking locally with our Dynamic 1-bit GGUFs:
unsloth/Kimi-K2-Thinking-GGUF
We shrank the 1T model to 245GB (-62%) & retained ~85% of accuracy on Aider Polyglot. Run on >247GB RAM for fast inference.
We also collaborated with the Moonshot AI Kimi team on a system prompt fix! π₯°
Guide + fix details: https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-locally
We shrank the 1T model to 245GB (-62%) & retained ~85% of accuracy on Aider Polyglot. Run on >247GB RAM for fast inference.
We also collaborated with the Moonshot AI Kimi team on a system prompt fix! π₯°
Guide + fix details: https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-locally
posted
an
update
5 months ago
Post
6638
Run DeepSeek-V3.1 locally on 170GB RAM with Dynamic 1-bit GGUFs!π
GGUFs: unsloth/DeepSeek-V3.1-GGUF
The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.
The 1-bit GGUF passes all our code tests & we fixed the chat template for llama.cpp supported backends.
Guide: https://docs.unsloth.ai/basics/deepseek-v3.1
GGUFs: unsloth/DeepSeek-V3.1-GGUF
The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.
The 1-bit GGUF passes all our code tests & we fixed the chat template for llama.cpp supported backends.
Guide: https://docs.unsloth.ai/basics/deepseek-v3.1