Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

Shrijanagain 
posted an update 1 day ago
view post
Post
3662
Surya-1.1T: Scaling Beyond Human-Level Reasoning via 146 Trillion Token Pre-training
Author: Shrijan Kumar Tiwari
Affiliation: SKT AI Labs / Project Surya
Model Architecture: Optimized Dense Transformer
Parameters: 1.1 Trillion
Training Tokens: 146 Trillion

Wanna collaborate us Friends let's Start Journey we have Collected 146 trillon tokens and done pre training but we need to made more powerfull

Whitepaper - https://github.com/SHRIJANAGAIN/PROFF
  • 38 replies
·
Keeby-smilyai 
posted an update 3 days ago
view post
Post
3007
Hello everyone!
  • 1 reply
·
DedeProGames 
posted an update 2 days ago
view post
Post
3574
Can small models program?

Although even if they are reasoning AIs, small AIs cannot create extensive and high-quality code, at least that's what is commonly thought.

We present OrionLLM/NanoCoder-0.6b, an AI with just 600 million parameters based on qwen3-0.6b and trained with the dataset nvidia/OpenCodeReasoning.

While not good at complex code, we observed a significant improvement in code generation (especially in Python code), demonstrating that, when trained correctly, small AIs can, in fact, program.
  • 2 replies
·
danielhanchen 
posted an update 3 days ago
view post
Post
2788
Introducing Unsloth Studio ✨
A new open-source web UI to train and run LLMs.

• Run models locally on Mac, Windows, Linux
• Train 500+ models 2x faster with 70% less VRAM
• Supports GGUF, vision, audio, embedding models
• Auto-create datasets from PDF, CSV, DOCX
• Self-healing tool calling and code execution
• Compare models side by side + export to GGUF

GitHub: https://github.com/unslothai/unsloth
Blog and Guide: https://unsloth.ai/docs/new/studio

Available now on Hugging Face, NVIDIA, Docker and Colab.
fffiloni 
posted an update 2 days ago
view post
Post
3558
I brought DALL·E mini back to life 🤖🎨

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model — still weird, still fun 😄
  • 1 reply
·
ZennyKenny 
posted an update 2 days ago
view post
Post
3042
🤔 So we're supposed to post our repo storage graphs now right?
ajibawa-2023 
posted an update 3 days ago
view post
Post
2609
C-Code-Large
Dataset: ajibawa-2023/C-Code-Large

C-Code-Large is a large-scale corpus of C programming language source code comprising more than 4 million code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, and software engineering automation for the C ecosystem.

By offering a high-volume, language-focused dataset, C-Code-Large enables targeted experimentation in low-level programming, memory-constrained environments, and performance-critical systems, where C continues to be a dominant language.

C-Code-Large addresses the lack of large, curated, C-specific datasets, making it possible to conduct focused research on procedural programming paradigms, manual memory management, and system-level abstractions.

prithivMLmods 
posted an update 3 days ago
view post
Post
2874
Introducing QIE-Bbox-Studio! 🔥🤗

The QIE-Bbox-Studio demo is now live — more precise and packed with more options. Users can manipulate images with object removal, design addition, and even move objects from one place to another, all in just 4-step fast inference.

🤗 Demo: prithivMLmods/QIE-Bbox-Studio
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/QIE-Bbox-Studio

🚀 Models [LoRA] :

● QIE-2511-Object-Mover-Bbox: prithivMLmods/QIE-2511-Object-Mover-Bbox
● QIE-2511-Object-Remover-Bbox-v3: prithivMLmods/QIE-2511-Object-Remover-Bbox-v3
● QIE-2511-Outfit-Design-Layout: prithivMLmods/QIE-2511-Outfit-Design-Layout
● QIE-2509-Object-Remover-Bbox-v3: prithivMLmods/QIE-2509-Object-Remover-Bbox-v3
● QIE-2509-Object-Mover-Bbox: prithivMLmods/QIE-2509-Object-Mover-Bbox

🚀 Collection:

● Qwen Image Edit [Layout Bbox]: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-layout-bbox

To learn more, visit the app page or the respective model pages.
Nymbo 
posted an update 5 days ago
view post
Post
6112
We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.
  • 3 replies
·
kanaria007 
posted an update about 9 hours ago
view post
Post
44
✅ Article highlight: *Long-Horizon Planning under SI-Core* (art-60-046, v0.1)

TL;DR:
Most discussions stop at the next Jump, the next rollout wave, or the next experiment. This article asks a harder question: how do you bind *30-second decisions* and *30-year plans* into the same structural story?

The answer here is *Plan Jumps*: long-horizon artifacts for infrastructure programs, policy trajectories, and institutional reforms, evaluated over scenario bundles, monitored with explicit replan triggers, and kept auditable through the same SIR / EVAL / SCover / SCI / CAS logic used at shorter horizons.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• turns plans themselves into first-class, traceable objects instead of PDF promises
• connects operational Jumps, tactical adjustments, and decade-scale plans in one runtime story
• treats uncertainty, scenario comparison, and replanning as built-in structure, not afterthoughts
• keeps politics and governance explicit instead of pretending models should “choose the future”

What’s inside:
• *Plan Jumps* for 5–30 year horizons
• *scenario bundles* and long-horizon world models
• *Plan-GCS*, SCover / SCI / CAS over decades
• *policy-level Genius Replay* for reusable historical plan structure
• *PoLB + EVAL* for shadow / pilot / staged rollout of sub-policies
• *policy-to-goal contracts*, budget envelopes, and governance review cycles
• *uncertainty propagation*, confidence bands, and robust plan selection
• *replan triggers* for scheduled, threshold, event-driven, and learning-based revision
• *intergenerational equity* and future citizens as explicit principals

Key idea:
SI-Core should not only explain what happened this minute. It should also help humans steer what happens over the next 10–30 years — with plans that are structured, replayable, revisable, and politically inspectable.