PLAN-Lab/SpatialReasoner-R1
Updated
•
27
None defined yet.
Best of Both Worlds: Multimodal Reasoning and Generation via Unified Discrete Flow Matching
PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation