MR. Judge: Multimodal Reasoner as a Judge

Renjie Pi; Felix Bai; Qibin Chen; Simon Wang; Jiulong Shan; Kieran Liu; Meng Cao

arXiv:2505.13403·cs.CL·May 20, 2025

MR. Judge: Multimodal Reasoner as a Judge

Renjie Pi, Felix Bai, Qibin Chen, Simon Wang, Jiulong Shan, Kieran Liu, Meng Cao

PDF

Open Access 1 Video

TL;DR

MR. Judge introduces a multimodal reasoning-based judging paradigm for large language models, enhancing interpretability and performance in response evaluation through deliberate reasoning and automatic annotation strategies.

Contribution

The paper proposes MR. Judge, a novel multimodal reasoning framework for LLM judges, with automatic annotation techniques to improve evaluation accuracy and interpretability.

Findings

01

MR. Judge-7B surpasses GPT-4o by 9.9% on VL-RewardBench.

02

Improves MM-Vet performance during inference-time scaling by up to 7.7%.

03

Effective across various tasks with enhanced reasoning capabilities.

Abstract

The paradigm of using Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) as evaluative judges has emerged as an effective approach in RLHF and inference-time scaling. In this work, we propose Multimodal Reasoner as a Judge (MR. Judge), a paradigm for empowering general-purpose MLLMs judges with strong reasoning capabilities. Instead of directly assigning scores for each response, we formulate the judgement process as a reasoning-inspired multiple-choice problem. Specifically, the judge model first conducts deliberate reasoning covering different aspects of the responses and eventually selects the best response from them. This reasoning process not only improves the interpretibility of the judgement, but also greatly enhances the performance of MLLM judges. To cope with the lack of questions with scored responses, we propose the following strategy to achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MR. Judge: Multimodal Reasoner as a Judge· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)