That is exactly what I'm planning on doing!
R
AI & ML interests
Recent Activity
Organizations
Update: I've completed the first 9 layers and will be taking a step back for a quick mo to adjust and update the auto trainer for finer resolution and other shit I have swimming around in my brain.
Yo every month this resets? Thank you for my new guilty pleasure. This playground feels like it was personally designed for my weird ass ideas. I'm about to get all KINDS of stupid up in here. You don't even know! ๐ค
MTP enables Qwen3.6 to generate ~1.4โ2.2ร faster with no accuracy change.
Qwen3.6-27B: unsloth/Qwen3.6-27B-MTP-GGUF
Qwen3.6-35B-A3B: unsloth/Qwen3.6-35B-A3B-MTP-GGUF
Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide
JumpReLU Sparse Autoencoders trained on every layer of Gemma-4-E2B-it using an adaptive Lagrangian controller. Training in progress. I'm publishing layers live as they come hot off the press for anyone interested in following along. I will be making further adjustments for finer resolution but the early data should be helpful I think? I'm just a bartender don't trust everything I say. ๐ค The Lagrangian math is pretty cool. It auto-steers the trainer taking the guess work out of hyperparameter adjustments.
Full paper and methodology when ever I get around to writing it up. There's a lot of work to be done. For now though, enjoy! ๐ค
juiceb0xc0de/gemma-4-e2b-saes
The Brain Atlas is an interactive tool that lets you explore the internal behavior of Google's Gemma-4-E2B model layer by layer, head by head. Pick a behavior category, pick a layer, and see exactly which components light up and which go quiet. The dataset is fully queryable if you want to go deeper.
The mapping combines multiple single-direction techniques run in parallel across every layer and component. Activation taxonomy (classifying each neuron by how broadly it fires across prompt categories), coactivation pair analysis (which neurons lock together and on what topics), F-stat behavioral separation (one-way ANOVA per feature across 16 behavior categories), per-head specificity scoring, and a full compliance probe pipeline using SVD, sparse decomposition, and variance analysis.
Here's what I found when I ran it.
The sharpest behavioral signal isn't at the output. It's Layer 0. Up projection hits F=22.7, nearly 2x anything in the final third of the network. The model does its behavioral sorting before it's barely started, then spends the next 34 layersโฆ doing what exactly?
The gate has a lifecycle. 70% dormant at L1, highest in the model. Brutal sparsification at L23โ26 (>58% silent). Then reopens. The final five layers are the most alive gates anywhere. The model's last act is a gate flare.
Layer 4 routes 5 projections to dim 448. One layer. One dimension. That's a topology highway.
Zero specialist neurons. Not one. 1.2M neurons analyzed. None fires exclusively on a single category. This model distributes everything.
๐ง Space: juiceb0xc0de/gemma-4-e2b-brain-atlas
๐ Dataset (1.3M rows, fully queryable): juiceb0xc0de/gemma-4-e2b-atlas
Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! ๐
Blog: https://unsloth.ai/blog/pytorch
GitHub: https://github.com/unslothai/unsloth
I don't aim to remove guard rails or the LLM identity entirely, what I want to do is dampen RLHF to a manageable volume. Personality models perform better with guardrails intact no different than humans with moral guidelines and boundaries. Refusals can help steer and mold personality. RLHF however drowns out adaptability so I'm cranking it down for you to crank your project up!
juiceb0xc0de/bella-bartender-gemma-e2b
juiceb0xc0de/locus-gemma-4-e2b
https://hfviewer.com
โจ After installing, Hugging Face model pages will have an architecture visualization on the model page itself!
๐ Link:
https://chromewebstore.google.com/detail/hugging-face-viewer/mmadlggmpkpiockpjfepaohcllbnakej
Thanks for all the nice feedback so far! โค๏ธ
We open sourced our internal tooling at
Let me know what you think! :D
Why do things gotta fall in your lap 15 minutes after you need them all the time ๐๐ฝ
Learn how 3 optimizations help your home GPU train models faster:
1. Packed-sequence metadata caching
2. Double-buffered checkpoint reloads
3. Faster MoE routing
Guide: https://unsloth.ai/blog/nvidia-collab
GitHub: https://github.com/unslothai/unsloth
i started a discussion we can talk about this is on my lr scheduler benchmark space. just go to my hugging face space and click community in the top right corner.
What are you using to run your local models? llama.cpp ollama vLLM?
NVIDIA A10 24GO gddr6
AMD EPYC7313p
128Go ddr4ecc
Thanks to ppl who gonna help bcs i cant found a fast enought all 31b models are slow even the idk what to do or config if someone can send me the config too thx !
Yo, i personally love the Qwen2.5-coder line of models. I use it to adversarially look at code from other models very frequently. with your setup you could use Qwen/Qwen2.5-Coder-14B-Instruct-GGUF and use the q5_0.gguf quantized version. As far as configs go you could set:
Temperature 0.6
Top_P 1.0
Min_P 0
Alternatives you could use would be:
DeepSeek-Coder-V2-Lite-Instruct
Qwen2.5-Coder-7B Q8
For 31b models to fit with your hardware you would have to use q3 quants and the quality is not going to be the greatest. Alternatively you could look into using a service like Modal. They offer free GPU credits monthly. You can run an app as a shell and use ollama through their GPaaS. This gives you varying GPU's with VRAM that will fit to specific models you're looking for. But if completely local is what you want the models I've listed above should fit your needs.
training recipes, though, diverse knowledge and a second point of view are crucial. Pairing my Claude Max sub with an Ollama Pro sub has already saved me from days of botched trainings โ multiple frontier
models helping Claude is next level. Acting as the middleman myself was interesting but inefficient, so I shipped skills that let Claude talk to Ollama models directly.
๐ claude-hooks v1.1.0 ships two LLM-to-LLM skills.
๐ฌ /get-advice โ single-shot second opinion. Claude runs a multi-turn conversation with a configured Ollama advisor; the advisor grounds in your project through read_file / grep / glob / list_files /
recall_memory tools. Effort tiers cap fresh-session retries.
๐ค /consultants โ multi-agent council for cross-cutting questions:
๐งฉ planner โ researcher โ critic โ synthesizer
Each role runs its own Ollama model. ๐พ Sessions persist to disk (summary.md + transcript.md + SQLite per-role message threads); closed sessions reopen and produce follow-ups ๐ indistinguishable from warm
ones.
๐ฏ x-tier effort multiplies diversity:
โข xmedium / xhigh โ researcher fans across N models in parallel
โข xmax โ + multi-critic + meta-critic combine; critics anonymized as "Critic 1/2/3" to avoid model-bias
๐ก๏ธ Cloud-flap recovery, three layers: 15-attempt / ~15min retry budget; synthesizer failure-fallback model chain; degraded-answer composer surfaces researcher + critic work even when synthesis fails.
๐ 7 cloud models benchmarked & Claude-graded on locked queries:
โข PROD-READY (P:A R:A C:A S:A): kimi-k2.6, gemma4:31b, glm-5.1
โข Role specialists: minimax-m2.7 (critic), qwen3.5 (planner)
๐ง Linux/macOS/Windows. No per-project setup.
๐ github.com/mann1x/claude-hooks
Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM
Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp
Guide: https://unsloth.ai/docs/basics/api
Howdy,
Is anybody else willing to put a second mortgage on their house, just to spend 40k USD in compute credits? Just me? k...
I got dreams, man. The datasets I could build with 40k would be insane.
Somebody called me a genius the other day, they'd be shocked to find out, that I would put my house on the line for 30 days of runpod usage.
What would you do with it?
I would turn arxiv into a dataset. Turn each arxiv paper into a QnA.
Or... maybe if I got 40k USD in credit's Id end up like those 16 lost scientists.
Food for thought.
Anyways, I think I'm going to make a post once a week.
In the meantime you can find me building small llm's in discord here:
https://discord.gg/4DdwS9D8x9
1. Unsloth
GitHub: https://github.com/unslothai/unsloth
โ Fastest way to fine-tune LLMs locally
โ Optimized for low VRAM (even laptops)
โ Plug-and-play with Hugging Face models
2. Axolotl
GitHub: https://github.com/OpenAccess-AI-Collective/axolotl
โ Flexible LLM fine-tuning configs
โ Supports LoRA, QLoRA, multi-GPU
โ Great for custom training pipelines
3. TRL (Transformer Reinforcement Learning)
GitHub: https://github.com/huggingface/trl
โ RLHF, DPO, PPO for LLM alignment
โ Built on Hugging Face ecosystem
โ Essential for post-training optimization
4. DeepSpeed
GitHub: https://github.com/microsoft/DeepSpeed
โ Train massive models efficiently
โ Memory + speed optimization
โ Industry standard for scaling
5. LLaMA-Factory
GitHub: https://github.com/hiyouga/LLaMA-Factory
โ All-in-one fine-tuning UI + CLI
โ Supports multiple models (LLaMA, Qwen, etc.)
โ Beginner-friendly + powerful
6. PEFT
GitHub: https://github.com/huggingface/peft
โ Fine-tune with minimal compute
โ LoRA, adapters, prefix tuning
โ Best for cost-efficient training