Flex-Judge: Text-Only Reasoning Unleashes Zero-Shot Multimodal Evaluators

Jongwoo Ko; Sungnyun Kim; Sungwoo Cho; Se-Young Yun

arXiv:2505.18601·cs.CL·October 21, 2025

Flex-Judge: Text-Only Reasoning Unleashes Zero-Shot Multimodal Evaluators

Jongwoo Ko, Sungnyun Kim, Sungwoo Cho, Se-Young Yun

PDF

1 Repo 3 Models

TL;DR

Flex-Judge introduces a reasoning-guided approach that uses minimal textual data to create a versatile, cost-effective multimodal evaluator capable of generalizing across diverse tasks and modalities.

Contribution

It proposes a novel reasoning-based framework that enables a single judge model to generalize across multiple modalities with minimal training data.

Findings

01

Achieves competitive performance with fewer training resources.

02

Outperforms some commercial multimodal evaluators.

03

Effective in resource-scarce domains like molecular evaluation.

Abstract

Human-generated reward signals are critical for aligning generative models with human preferences, guiding both training and inference-time evaluations. While large language models (LLMs) employed as proxy evaluators, i.e., LLM-as-a-Judge, significantly reduce the costs associated with manual annotations, they typically require extensive modality-specific training data and fail to generalize well across diverse multimodal tasks. In this paper, we propose Flex-Judge, a reasoning-guided multimodal judge model that leverages minimal textual reasoning data to robustly generalize across multiple modalities and evaluation formats. Our core intuition is that structured textual reasoning explanations inherently encode generalizable decision-making patterns, enabling an effective transfer to multimodal judgments, e.g., with images or videos. Empirical results demonstrate that Flex-Judge, despite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jongwooko/flex-judge
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.