DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning

Nithin Sivakumaran; Justin Chih-Yao Chen; David Wan; Yue Zhang; Jaehong Yoon; Elias Stengel-Eskin; Mohit Bansal

arXiv:2512.07132·cs.CL·December 9, 2025

DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning

Nithin Sivakumaran, Justin Chih-Yao Chen, David Wan, Yue Zhang, Jaehong Yoon, Elias Stengel-Eskin, Mohit Bansal

PDF

Open Access 1 Video

TL;DR

DART is a multi-agent framework that uses disagreements among visual agents to identify and utilize useful visual tools, improving multimodal reasoning performance across diverse benchmarks.

Contribution

We introduce DART, a novel multi-agent debate framework leveraging disagreements to select and incorporate visual tools, enhancing reasoning accuracy and adaptability.

Findings

01

DART outperforms baselines on multiple benchmarks.

02

DART effectively adapts to new tools in applied domains.

03

Rich multi-round discussions improve reasoning quality.

Abstract

Specialized visual tools can augment large language models or vision language models with expert knowledge (e.g., grounding, spatial reasoning, medical knowledge, etc.), but knowing which tools to call (and when to call them) can be challenging. We introduce DART, a multi-agent framework that uses disagreements between multiple debating visual agents to identify useful visual tools (e.g., object detection, OCR, spatial reasoning, etc.) that can resolve inter-agent disagreement. These tools allow for fruitful multi-agent discussion by introducing new information, and by providing tool-aligned agreement scores that highlight agents in agreement with expert tools, thereby facilitating discussion. We utilize an aggregator agent to select the best answer by providing the agent outputs and tool information. We test DART on four diverse benchmarks and show that our approach improves over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)