GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning

Jusheng Zhang; Yijia Fan; Wenjun Lin; Ruiqi Chen; Haoyi Jiang; Wenhao Chai; Jian Wang; Keze Wang

arXiv:2505.23399·cs.AI·May 30, 2025

GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning

Jusheng Zhang, Yijia Fan, Wenjun Lin, Ruiqi Chen, Haoyi Jiang, Wenhao Chai, Jian Wang, Keze Wang

PDF

Open Access

TL;DR

GAM-Agent introduces a game-theoretic multi-agent framework that enhances vision-language reasoning through structured collaboration, uncertainty management, and multi-round debates, leading to improved accuracy and interpretability.

Contribution

It presents a novel multi-agent, game-theoretic framework with uncertainty-aware control for robust visual reasoning, outperforming prior single-agent models.

Findings

01

Significant accuracy improvements on four benchmarks.

02

Boosts small-to-mid scale model performance by 5-6%.

03

Enhances strong models like GPT-4o by 2-3%.

Abstract

We propose GAM-Agent, a game-theoretic multi-agent framework for enhancing vision-language reasoning. Unlike prior single-agent or monolithic models, GAM-Agent formulates the reasoning process as a non-zero-sum game between base agents--each specializing in visual perception subtasks--and a critical agent that verifies logic consistency and factual correctness. Agents communicate via structured claims, evidence, and uncertainty estimates. The framework introduces an uncertainty-aware controller to dynamically adjust agent collaboration, triggering multi-round debates when disagreement or ambiguity is detected. This process yields more robust and interpretable predictions. Experiments on four challenging benchmarks--MMMU, MMBench, MVBench, and V*Bench--demonstrate that GAM-Agent significantly improves performance across various VLM backbones. Notably, GAM-Agent boosts the accuracy of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling

MethodsBalanced Selection