AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
Abstract
AdaReasoner enables multimodal models to learn tool usage as a general reasoning skill through scalable data curation, reinforcement learning for tool selection, and adaptive learning mechanisms that improve performance on complex visual reasoning tasks.
When humans face problems beyond their immediate capabilities, they rely on tools, providing a promising paradigm for improving visual reasoning in multimodal large language models (MLLMs). Effective reasoning, therefore, hinges on knowing which tools to use, when to invoke them, and how to compose them over multiple steps, even when faced with new tools or new tasks. We introduce AdaReasoner, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior. AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that optimizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage. Together, these components allow models to infer tool utility from task context and intermediate outcomes, enabling coordination of multiple tools and generalization to unseen tools. Empirically, AdaReasoner exhibits strong tool-adaptive and generalization behaviors: it autonomously adopts beneficial tools, suppresses irrelevant ones, and adjusts tool usage frequency based on task demands, despite never being explicitly trained to do so. These capabilities translate into state-of-the-art performance across challenging benchmarks, improving the 7B base model by +24.9\% on average and surpassing strong proprietary systems such as GPT-5 on multiple tasks, including VSP and Jigsaw.
Community
arXivlens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/adareasoner-dynamic-tool-orchestration-for-iterative-visual-reasoning-1617-795f558e
- Executive Summary
- Detailed Breakdown
- Practical Applications
arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/adareasoner-dynamic-tool-orchestration-for-iterative-visual-reasoning
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning (2025)
- MEDVISTAGYM: A Scalable Training Environment for Thinking with Medical Images via Tool-Integrated Reinforcement Learning (2026)
- Thinking with Programming Vision: Towards a Unified View for Thinking with Images (2025)
- GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents (2026)
- Teaching LLMs to Learn Tool Trialing and Execution through Environment Interaction (2026)
- AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning (2025)
- CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 4
Datasets citing this paper 8
Browse 8 datasets citing this paperSpaces citing this paper 0
No Space linking this paper