🤝 Open to Collab

Rémi Ouazan Reboul

ror

14 23 6

AI & ML interests

None yet

Recent Activity

upvoted an article 4 days ago

Profiling in PyTorch (Part 3): Attention is all you profile

published an article 5 days ago

Profiling in PyTorch (Part 3): Attention is all you profile

upvoted a collection 8 days ago

Llama 4

View all activity

Organizations

upvoted an article 4 days ago

Article

Profiling in PyTorch (Part 3): Attention is all you profile

ariG23498, sergiopaniego, sayakpaul, ror

•

5 days ago

• 23

published an article 5 days ago

Article

Profiling in PyTorch (Part 3): Attention is all you profile

ariG23498, sergiopaniego, sayakpaul, ror

•

5 days ago

• 23

upvoted a collection 8 days ago

Llama 4

Collection

Llama 4 release • 13 items • Updated Apr 29, 2025 • 741

upvoted a paper 28 days ago

GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 199

upvoted 2 articles about 1 month ago

Article

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

ariG23498, sayakpaul, sergiopaniego, ror, pcuenq

•

May 29

• 143

Article

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

ariG23498, ror, sergiopaniego, pcuenq, sayakpaul

•

Jun 11

• 54

published an article about 1 month ago

Article

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

ariG23498, ror, sergiopaniego, pcuenq, sayakpaul

•

Jun 11

• 54

upvoted an article about 1 month ago

Article

How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces

mishig

•

Jun 9

• 23

published an article about 2 months ago

Article

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

ariG23498, sayakpaul, sergiopaniego, ror, pcuenq

•

May 29

• 143

updated a Space about 2 months ago

Transformers Point Cloud

🐝

Explore transformer repo files in a 3D point cloud

published a Space about 2 months ago

Transformers Point Cloud

🐝

Explore transformer repo files in a 3D point cloud

updated a Space about 2 months ago

Transformers Point Cloud

🐝

Explore and zoom into a 3D file map with chat commands

published a Space about 2 months ago

Transformers Point Cloud

🐝

Explore and zoom into a 3D file map with chat commands

commented on Unlocking asynchronicity in continuous batching about 2 months ago

The inputs and outputs are actually pre-allocated static tensors, before any CUDA graph is even created.

As for why the pool is useful:
Say you capture a CUDA graph A: itneeds memory to execute. This memory is needed to store activations or workspace tensors for the kernels launched in graph A. This allocated memory is owned by graph A.
Then, if you create another graph B, it will also allocate some memory for its execution. Because CUDA can't be sure you won't run graph A and B at the same time, the memory allocated for graph A and B can't be the same: you would risk data corruption. But if you can guarantee graph A and B won't run at the same time, then there is no reason not to allocate the same memory to graph A and B. That memory shared between graphs A and B is the memory pool.

And as a sidenote, if you know which graph is going to need the most memory, you better capture it first. That way, the graph pool has the maximum size right away, and graph you capture afterwards can always fit inside the pool. Whereas if you capture a graph that requires a low amount of memory and then try to capture a graph that requires more memory, you run the risk of having memory fragmentation.

commented on Unlocking asynchronicity in continuous batching about 2 months ago

Thank you!

upvoted an article 2 months ago

Article

Unlocking asynchronicity in continuous batching

ror, pcuenq, ariG23498

•

May 14

• 63

published an article 2 months ago

Article

Unlocking asynchronicity in continuous batching

ror, pcuenq, ariG23498

•

May 14

• 63

updated a dataset 2 months ago

huggingface/documentation-images

Viewer • Updated 4 days ago • 59 • 2.75M • 164

upvoted an article 2 months ago

Article

Two Years of Local AI on a Laptop: When Open Models Outpaced Moore's Law

mishig

•

May 11

• 24

New activity in huggingface/documentation-images 3 months ago

Upload images for the continuous async blog post

#611 opened 3 months ago by

ror

Rémi Ouazan Reboul

AI & ML interests

Recent Activity

Organizations

ror's activity

Profiling in PyTorch (Part 3): Attention is all you profile

Profiling in PyTorch (Part 3): Attention is all you profile

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

Transformers Point Cloud

Transformers Point Cloud

Transformers Point Cloud

Transformers Point Cloud

Unlocking asynchronicity in continuous batching

Unlocking asynchronicity in continuous batching

Two Years of Local AI on a Laptop: When Open Models Outpaced Moore's Law

Upload images for the continuous async blog post