Debating for Better Reasoning: An Unsupervised Multimodal Approach

Ashutosh Adhikari; Mirella Lapata

arXiv:2505.14627·cs.AI·May 21, 2025

Debating for Better Reasoning: An Unsupervised Multimodal Approach

Ashutosh Adhikari, Mirella Lapata

PDF

Open Access

TL;DR

This paper introduces a multimodal debate framework where vision-language models debate answers, and a text-only judge evaluates them, leading to improved performance and reasoning in models, especially for visual question answering tasks.

Contribution

It extends the debate paradigm to multimodal settings, enabling weaker models to supervise and enhance stronger models' reasoning capabilities.

Findings

01

Debate framework outperforms individual models on multimodal tasks.

02

Weaker LLM judgments can improve vision-language model reasoning.

03

The approach reduces reliance on explicit role-playing in debates.

Abstract

As Large Language Models (LLMs) gain expertise across diverse domains and modalities, scalable oversight becomes increasingly challenging, particularly when their capabilities may surpass human evaluators. Debate has emerged as a promising mechanism for enabling such oversight. In this work, we extend the debate paradigm to a multimodal setting, exploring its potential for weaker models to supervise and enhance the performance of stronger models. We focus on visual question answering (VQA), where two "sighted" expert vision-language models debate an answer, while a "blind" (text-only) judge adjudicates based solely on the quality of the arguments. In our framework, the experts defend only answers aligned with their beliefs, thereby obviating the need for explicit role-playing and concentrating the debate on instances of expert disagreement. Experiments on several multimodal tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsFocus