AI & ML interests

None defined yet.

Recent Activity

qgallouedecย  updated a dataset 1 day ago
trl-lib/trackio-dataset
qgallouedecย  updated a dataset 2 days ago
trl-lib/documentation-images
qgallouedecย  updated a Space 3 days ago
trl-lib/trackio
View all activity

sergiopaniegoย 
posted an update about 9 hours ago
view post
Post
66
Meet the Post-Training Toolkit (PTT), which easily integrates with TRL via a single callback, by Aditya Challapally ( @microsoft ):

๐Ÿ” Detects training issues early
๐Ÿ›  Lets you intervene safely
๐Ÿ“Š Keeps long training runs stable, auditable & efficient

Microsoft blog: https://devblogs.microsoft.com/engineering-at-microsoft/diagnosing-instability-in-production-scale-agent-rl/

Integration guide: https://huggingface.co/docs/trl/main/en/ptt_integration

Code: https://github.com/microsoft/post-training-toolkit
sergiopaniegoย 
posted an update 1 day ago
sergiopaniegoย 
posted an update 3 days ago
sergiopaniegoย 
posted an update 10 days ago
view post
Post
1549
FunctionGemma Tuning Lab is a new no-code tool by @google that lets you fine-tune a model directly from the browser, with no coding knowledge required, using TRL behind the scenes.

blog: https://developers.googleblog.com/a-guide-to-fine-tuning-functiongemma/

try it out: google/functiongemma-tuning-lab

This example builds on a more advanced one for learning fine-tuning with SFT using TRL: https://ai.google.dev/gemma/docs/functiongemma/finetuning-with-functiongemma
  • 1 reply
ยท
sergiopaniegoย 
posted an update 13 days ago
view post
Post
755
TRL v0.27.0 is out!! ๐Ÿฅณ

It includes GDPO, the latest variant of GRPO for multi-reward RL โœจ
GDPO decouples reward normalization to avoid reward collapse and improve per-reward convergence โ€” developed by
@sliuau @SimonX et al.

Explore the paper: GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization (2601.05242)

Explore the full set of changes here:
https://github.com/huggingface/trl/releases/tag/v0.27.0
sergiopaniegoย 
posted an update 16 days ago
view post
Post
2967
New REPL environment in OpenEnv available! โœจ
Used in the Recursive Language Models (RLM) paper by Alex Zhang.

Ready for inference & post-training using trajectories. Handles long contexts:

> Run Python code in a sandbox
> Make recursive calls to LMs
> Explore data programmatically
> Return final result

Docs: https://meta-pytorch.org/OpenEnv/environments/repl/
Inference script: https://github.com/meta-pytorch/OpenEnv/blob/main/examples/repl_oolong_simple.py
sergiopaniegoย 
posted an update 17 days ago
view post
Post
430
Recursive Language Models (RLM) is a new interface for LLMs with cool ideas by Alex Zhang!

โš ๏ธ LLMs struggle with long prompts โ†’ attention overload & lost info
๐Ÿ”„ RLMs inspect, split & call themselves on chunks, then aggregate results
โœ… Handles millions of tokens, reduces noise, improves reasoning
๐Ÿ’ก System prompt guides recursion
๐ŸŽฏ RLM trajectories can be used for RL training or distillation (OpenEnv+TRL!!)

We're adding it to OpenEnv (with Kashif Rasul): https://github.com/meta-pytorch/OpenEnv/pull/282

More resources:

> Paper: Recursive Language Models (2512.24601)
> Paper blog: https://alexzhang13.github.io/blog/2025/rlm/
> RLM repo: https://github.com/alexzhang13/rlm
  • 2 replies
ยท
sergiopaniegoย 
posted an update 21 days ago
sergiopaniegoย 
posted an update 27 days ago
view post
Post
2587
The list of hands-on notebooks (some beginner-friendly!) to get started with fine-tuning using TRL keeps growing!!

โ€ข SFT
โ€ข GRPO
โ€ข Tool calling & agents
โ€ข RL environments with OpenEnv
โ€ข LLMs and VLMs
โœจ Many run on FREE Colab, making it super easy to get started fast!

https://github.com/huggingface/trl/tree/main/examples/notebooks
sergiopaniegoย 
posted an update about 1 month ago
sergiopaniegoย 
posted an update about 1 month ago
sergiopaniegoย 
posted an update about 1 month ago
sergiopaniegoย 
posted an update about 1 month ago
view post
Post
2008
The Christmas holidays are here! ๐ŸŽ„
Thinking about learning something new in AI?

@huggingface offers 12 FREE courses covering all the relevant topics, for every level of experience. A great challenge for the holidays (and worth saving for later ๐Ÿ™„)

Letโ€™s explore them!

๐Ÿง  ๐—Ÿ๐—Ÿ๐—  ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: large language models with HF tools
https://huggingface.co/learn/llm-course

๐Ÿค– ๐—”๐—ด๐—ฒ๐—ป๐˜๐˜€ ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: build and deploy AI agents
https://huggingface.co/learn/agents-course

๐ŸŽจ ๐——๐—ถ๐—ณ๐—ณ๐˜‚๐˜€๐—ถ๐—ผ๐—ป ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: diffusion models with ๐Ÿค— Diffusers
https://huggingface.co/learn/diffusion-course

๐Ÿ”Š ๐—”๐˜‚๐—ฑ๐—ถ๐—ผ ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: transformers for audio tasks
https://huggingface.co/learn/audio-course

๐ŸŽฎ ๐——๐—ฒ๐—ฒ๐—ฝ ๐—ฅ๐—Ÿ ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: deep reinforcement learning
https://huggingface.co/learn/deep-rl-course

๐Ÿ‘๏ธ ๐—–๐—ผ๐—บ๐—บ๐˜‚๐—ป๐—ถ๐˜๐˜† ๐—–๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ๐—ฟ ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: modern computer vision with HF
https://huggingface.co/learn/computer-vision-course

๐Ÿฆพ ๐—ฅ๐—ผ๐—ฏ๐—ผ๐˜๐—ถ๐—ฐ๐˜€ ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ (๐—Ÿ๐—ฒ๐—ฅ๐—ผ๐—ฏ๐—ผ๐˜): learning-based robotics
https://huggingface.co/learn/robotics-course

๐Ÿงฉ ๐— ๐—–๐—ฃ ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: Model Context Protocol explained
https://huggingface.co/learn/mcp-course

๐Ÿงช ๐—” ๐—ฆ๐—บ๐—ผ๐—น ๐—–๐—ผ๐˜‚๐—ฟ๐˜€๐—ฒ: post-training AI models
https://huggingface.co/learn/a-smol-course

๐Ÿ•น๏ธ ๐— ๐—Ÿ ๐—ณ๐—ผ๐—ฟ ๐—š๐—ฎ๐—บ๐—ฒ๐˜€: AI in game development
https://huggingface.co/learn/ml-for-games-course

๐ŸงŠ ๐— ๐—Ÿ ๐—ณ๐—ผ๐—ฟ ๐Ÿฏ๐——: machine learning for 3D data
https://huggingface.co/learn/ml-for-3d-course

๐Ÿ“˜ ๐—ข๐—ฝ๐—ฒ๐—ป-๐—ฆ๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—”๐—œ ๐—–๐—ผ๐—ผ๐—ธ๐—ฏ๐—ผ๐—ผ๐—ธ: practical AI notebooks
https://huggingface.co/learn/cookbook

All of them can be found here: https://huggingface.co/learn
sergiopaniegoย 
posted an update about 1 month ago
view post
Post
1898
Google DeepMind releases FunctionGemma, a 240M model specialized in ๐Ÿ”ง tool calling, built for fine-tuning

TRL has day-0 support. To celebrate, weโ€™re sharing 2 new resources:

> Colab guide to fine-tune it for ๐ŸŒ browser control with BrowserGym OpenEnv
> Standalone training script

> Colab notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_functiongemma_browsergym_openenv.ipynb
> Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/browsergym_llm.py (command to run it inside the script)
> More notebooks in TRL: https://huggingface.co/docs/trl/example_overview#notebooks