Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
Joya Chen
chenjoya
AI & ML interests
Video LLM
Recent Activity
upvoted
a
paper
about 13 hours ago
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
upvoted
a
paper
about 13 hours ago
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
upvoted
a
paper
20 days ago
FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection