🤝 Open to Collab

Rick Holmberg PRO

juiceb0xc0de

27 24 127

https://github.com/JuiceB0xC0de

AI & ML interests

Currently accepting pre-train models as candidates to run my neural network analysis pipeline on during intervals of the process. Early work is showing promising results in early stage intervention of drift and more organized results.

Recent Activity

upvoted a collection about 7 hours ago

Neural Networks

liked a model about 12 hours ago

Vortex5/G4-Starry-Ocean-12B

liked a dataset about 12 hours ago

EleutherAI/gpt2-wikitext-stdadam-trackstar-scores

View all activity

Organizations

reacted to Quazim0t0's post with 🤗 1 day ago

Post

4577

Disabling Gated Access for some of my models today. I will update this post with the list as I go. I had to go back recently and make updates to a lot of models and bit off more than I could chew with managing all the releases. I didn't realize many of you asked for access and I apologize for not accepting your access to the models you were wanting to look at. I don't like to release something fully unless I feel I put what I could into it for the moment. Some models will remain on gated access, but I will now be accepting those who request to view the repo.

Disabled Gated Access:
Quazim0t0/Byrne-VLM-131M - v2 Updates + Training Instructions
Quazim0t0/Byrne-Speech - 12M Tiny Speech model
Quazim0t0/Byrne-ASR-English - 12M Tiny ASR Model
Quazim0t0/Byrne-VE - Byrne-VE — Tiny Self-Distilled Vision Encoder (39M)
Quazim0t0/Positronic-144M - Research Artifact
Quazim0t0/SpikeWhale-SNN-216M - Research Artifact
Quazim0t0/Mycel-LM-79M - Research Artifact
Quazim0t0/Chimera-64M - Research Artifact

Accepting Gated Access Requests (7/9):
Quazim0t0/Wheeler-63M

Also uploaded my Neural Photonic Project:
Three trained nets in series: light interferes through the MZI2.pt optical core (verified 256/256), is measured by the PD.pt neural photodetector (verified 1024/1024), and folded into a single OUTPUT byte by the real ADC8 neural-CPU adder. Every value below is computed end-to-end by the three loaded, verified nets — no analytic formulas.
Demo: https://quazim0t0-neural-photonic-hybrid.hf.space/
Model Weights: Quazim0t0/neural-photonic

AND!

A work in progress:
Ashen Depths
https://quazim0t0-ashendepths.static.hf.space/index.html

reacted to danielhanchen's post with 🚀 2 days ago

Post

5721

DeepSeek-V4 can now run locally with Unsloth GGUFs! 🐳

Run lossless DeepSeek-V4-Flash on 168GB RAM or
3-bit works on 110GB Mac, RAM, VRAM setups.

Run via Unsloth Studio or llama.cpp.

GGUF: unsloth/DeepSeek-V4-Flash-GGUF
Guide: https://unsloth.ai/docs/models/deepseek-v4

replied to HannesVonEssen's post 3 days ago

Sick update. I've been using your visualizer for awhile now. Its cool to see your adding community focused features like this.

reacted to davidmezzetti's post with 👀 4 days ago

Post

1016

CeleBERTy Small: Domain model for Pop Culture, Art, Music and Entertainment.

Build vector embeddings that perform better than larger models.

https://huggingface.co/blog/NeuML/celeberty-small

reacted to Quazim0t0's post with 👀 4 days ago

Post

2389

Created research language model whose channel-mixing block is not an MLP. It is a differentiable Neighbour-Sensing fungal-colony-growth model: each token is expanded into a colony of hyphal tips that grow in a bounded latent region, sense a shared density field, and steer their own growth — the "MLP" is replaced by a few differentiable steps of colony growth, read back out into the hidden state.

Quazim0t0/Mycel-LM-79M

Also the original SpikeWhale project — the one that sparked all the other SpikeWhale related projects. Every spiking primitive here is hand-written in plain PyTorch: the leaky integrate-and-fire (LIF) neuron dynamics, the fast-sigmoid surrogate gradient, and the backprop-through-time training loop. No snntorch, no spikingjelly, no norse, no bindsnet — the network is a genuine from-scratch SNN.

Quazim0t0/SpikeWhale-SNN-216M

5 replies

replied to albertvillanova's post 5 days ago

Absolutely stoked! Huge milestone for the team. Enjoy a bit of peace now that its over. Or just keep on the grind it's what I always do!

reacted to albertvillanova's post with 🔥 5 days ago

Post

3380

🎉 KTO is now part of the stable TRL API

As of Promote KTO to stable API, KTOTrainer and KTOConfig have graduated from trl.experimental to the stable trl API. https://github.com/huggingface/trl/pull/6175

This one closes out a long road. Over the past 6+ months, the "Align KTO with DPO" effort landed ~90 PRs methodically bringing KTO up to the standard we hold for stable trainers, one carefully-scoped change at a time:
- Feature parity with DPO: full VLM support (incl. multi-image), sync_ref_model, PEFT + Liger, ZeRO-3 + PEFT dtype fix, pad_to_multiple_of, activation offloading, IterableDataset and dict eval_dataset, remove_unused_columns, and reference-logprob precomputation at init.
- Consistency with DPO: aligned method order and signatures, tokenization, _prepare_dataset, PEFT handling, ref-model preparation for distributed training, and config layout — plus a new DataCollatorForKTO and output format. Metrics moved into _compute_loss and simplified to direct averages via the shared _metrics attribute.
- Removing legacy baggage: dropped encoder-decoder support, BOS/EOS handling, null_ref_context, generate_during_eval, model_init, preprocess_logits_for_metrics, model/ref adapter names, and several dead config knobs.
- Coverage: a full test suite mirroring DPO, text collator tests, VLM tests, and slow tests.
- The promotion itself: the experimental → stable move (#6175) and shim cleanup (#6287), handled so downstream users get a clean deprecation path.

Honestly, this has been one of the more complex tasks I've taken on since joining the team, not because any single change was hard, but because it demanded sustained consistency across a ~2,000-line trainer, with every branch, comment, and edge case kept in lockstep with DPO.

Huge thanks to everyone who reviewed along the way (especially @qgallouedec ), the incremental review cadence is exactly what kept this maintainable.

KTO now sits on equal footing with our other flagship trainers. 🚀

2 replies

posted an update 7 days ago

Post

118

HF community! Thank you so much the overwhelming flood of downloads I've received on my model atlases over the last couple of weeks. It makes me so happy to know other people out there are enjoying my work. With that being said, I am looking for an opportunity from the builders and trainers out there.

I've recently had the chance to do a mid pre training snapshot durin process and am looking for the opportunity to get some more of these images . This is a symbiotic trade. I am more than happy to do full reports on the status of your model and translate what your model may stand to gain from as you head towards the finish line. The atlas building process uses 8965 different prompt over a wide variety of behavioural features. I combines many different methods of interpreting what is going on inside the model between prompt and response that we are unable to see.

For anyone else interested in neural network imaging and mechanistic interpretation check out my library of atlases and my new work translating the numbers into a 3d visual format to accompany the sqlite database. If you've gpt a model that you wish to have atlased leave a commen and I'kk be sure to ge arounf to it as soon as possible.

https://huggingface.co/collections/juiceb0xc0de/cloud-atlases

reacted to AbstractPhil's post with 👀 8 days ago

Post

Understanding the Aleph Fibonacci in visual form with full rotary.

https://claude.ai/public/artifacts/0d536427-bc7d-464a-890d-bddd02ce42dc

This ought to clear up much of the confusion as to what is actually happening under the hood, converted to an understandable 2d visual format. There have been multiple iterations, this is the current format and mathematics behind it as I attempt to solve the fibonacci curve related to negative imaginary numeric inversion that causes the statistics instability.

replied to mmhamdy's post 12 days ago

Haha no shit. I just finished writing an article on scaling ssm mamba style models and popped over to see what's new in posts. I guess there's a theme today.

posted an update about 1 month ago

Post

252

😅 You ever fumble on a project? Please someone tell me I'm not alone. I fumbled at step one and remained oblivious for the remainder of the project. Funny story, I was under the assumption that Qwen/Qwen3-8B was the base model that the paired with the Qwen SAE released by Alibaba. I didn't realize there was a Qwen3-8b-Base model until after the 12 hours of independent mapping techniques I had applied to the model that was missing the -Base suffix. 🤗 My bad, I'm just a bartender. I should not be unsupervised.

Not all is lost however. The outcome was a very in depth neural network atlas complete with its own SQLite queryable database for the Qwen3-8B model I can now share with you all. The data base combines these methods for a full in depth dive:

- Neuron Taxonomy
- Category Separation Scoring
- Co-activation Analysis
- Per-Head Decomposition
- Component Comparison
- Attribution Patching
- Sparse Non-negative Matrix Factorization
- NeuronLens
- DAS SVD rotation
- Cross-layer Coherence
- SQLite database

So if you've ever wondered where a specific behaviour or ability lives in the hidden dimensions of Qwen-8B or perhaps wanted to make informed quantization decisions please enjoy the fruits of my ill-informed labour lol. 😂

juiceb0xc0de/qwen3-8b-atlas
Qwen/Qwen3-8B

replied to appvoid's post about 1 month ago

I applaud you in your journey into the void with small models. I too am deeply fascinated with the optimization of smaller models rather than asking for more parameters and terabytes of scraped internet data. I hope to see what you've come up with in a few weeks time.

I just finished designing a sparsity training scheduler that trains on average 35% of a models available weights with almost no hidden dimensions between transformers adjoined and zero throughput while randomizing trainable locations. It cuts VRAM and training time down and the models set higher benchmarks on mathematics than FFT models trained on the same corpus. I discovered this while fucking around for fun.

I don't doubt the discoveries to be made with training smaller architectures have many more surprises in store for us.

replied to their post about 1 month ago

@danielhanchen what happened to this magnificent model!? I had the perfect place to slot it in to my team of AI bros! I would love to see this back on HF. 🤗

reacted to kalyan-ks's post with 👀 about 1 month ago

Post

1657

LLM Guardrail Models are Less Robust Against Text Mutation Attacks

Blog post - https://huggingface.co/blog/kalyan-ks/llm-guardrail-models-less-robust

Evaluated the robustness of three LLM guardrail models (GLiGuard, LlamaGuard3 and MiniGuard).

Evaluation is done using 16 text mutation attacks over three datasets (AEGIS 2.0, WildGuard and ExpGuard).

Achieved average Unsafe ASR score of up to 33% and average Safe ASR score of up to 25% against GLiGuard model.

Achieved average Unsafe ASR score of up to 35% and average Safe ASR score of up to 17% against LlamaGuard3-8B model.

Achieved average Unsafe ASR score of up to 45% and average Safe ASR score of up to 15% against MiniGuard v0.1 model.

reacted to pankajpandey-dev's post with 👀 about 1 month ago

Post

698

🇮🇳 Just shipped: MiniCPM5-1B-Hindi-Instruct (+ GGUF quants)

First Hindi instruction-tuned fine-tune of OpenBMB's brand-new MiniCPM5-1B (released this week).

Trained with Unsloth + LoRA (r=32) on AI4Bharat's anudesh + dolly Hindi splits — ~4k high-quality examples, 2 epochs on a single T4 in 60 minutes.

🔗 Model (16-bit + LoRA adapter):
pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct

📦 GGUF quants for llama.cpp / Ollama / LM Studio:
pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct-v1-GGUF

5 quant levels — from Q3_K_M (~560 MB, runs on a Raspberry Pi) to Q8_0 (~1.2 GB, near-lossless). Q4_K_M is the recommended default.

Part of my ongoing 🇮🇳 Hindi LLM Series — bringing strong open-source LLMs to Indian languages.

#Hindi #IndicNLP #MiniCPM5 #LoRA #Unsloth #GGUF #llamacpp #Ollama #LocalLLM

reacted to ProCreations's post with 🤗 about 1 month ago

Post

622

I kind of forgot to post that I made my AI model Intellite version 1, but yea, it is here. ProCreations/intellite-500m-sft

It is tiny and not extremely trained making it prone to hallucination, please double check all information. I can't afford to train it more or increase model size, so if anyone somehow has access to compute and want's to contribute, let me know.

replied to PhysiQuanty's post about 1 month ago

Would you be looking for something like this?
https://huggingface.co/spaces/strangertoolshf/huggingface-user-stats

reacted to AbstractPhil's post with 🤗 about 1 month ago

Post

821

The transformer prototype v2 is operational, which takes the behavior of the H2 battery and directly forces a projected rigid behavior into a multiscale structure. Turns roughly 57k params to around 90k params for the preliminary version, and with this behavior the model converges SEMI-CLOSE to the SVAE current spectrum in considerably less epochs. So stay tuned on that one, the transformer did converge. The behavior itself is validated and convergent in the H2 protocol spectrum.

The transformer operates with the "single" setting.

AbstractPhil/geolip-svae-transformer

I've implanted a rigid formula that allows this direct behavior from the H2 battery to superimpose onto adjacent structural boundaries, and with that built aleph and void into the system as well. These are guarantees.

As for the centrifuge concept. The optimization on the centrifuge was quite lackluster. The hardware doesn't support such behavior. You can access the current operating version of the centrifuge by utilizing "stacked" configuration. Four lenses was too much when running a quaternion bank to handle such complex interactions reasonably, so I will need to work something out in the future to get a full centrifuge system working.

Crusher is ready, transformer_v3.

You might be curious WHY these converge at such low raw MSE in the later stages. The reasoning is kind of difficult to explain, so I'll try to make it simple. The direction is very subtle in the later stages of training with AdamW, so the curves start to create much more accurate shifts towards the goals. This allows the model to rapidly converge after earlier heavier training. You can't simply train it low, it takes too long. This allows the model to KIND OF get everything NEAR where it's supposed to be, which allows the really small twitches of MSE to provide massive corrections without needing hard logits or more difficult to finetune features.

9 replies

reacted to codelion's post with 😎 about 1 month ago

Post

3237

Inspired by the Nemotron Diffusion recipe, check out dhara-250m: a 250M experimental language model that supports three decoding modes from one set of weights: autoregressive, block-diffusion, and self-speculation.

It is small, easy to try, and meant for exploring diffusion-style decoding and latency tradeoffs in compact LMs.

Model: codelion/dhara-250m

Try the chat demo here: codelion/dhara-chat

3 replies

reacted to TravisMuhlestein's post with 😎 about 1 month ago

Post

2335

Interesting to see broader ecosystem momentum forming around open standards for agentic systems.

Feels like conversations are increasingly converging around the same operational requirements: identity, interoperability, governance, trust boundaries, orchestration, and coordination between agents, tools, and services.

As agents become more operational, these infrastructure layers seem increasingly important for making larger multi-agent ecosystems reliable outside controlled environments.

https://www.linuxfoundation.org/press/agentic-ai-foundation-adds-43-new-members-as-enterprise-and-government-adoption-of-open-agent-standards-accelerates

Rick Holmberg PRO

AI & ML interests

Recent Activity

Organizations

juiceb0xc0de's activity