Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT

Yesheng Liu; Hao Li; Haiyu Xu; Baoqi Pei; Jiahao Wang; Mingxuan Zhao; Jingshu Zheng; Zheqi He; JG Yao; Bowen Qin; Xi Yang; Jiajun Zhang

arXiv:2511.17405·cs.CL·November 25, 2025

Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT

Yesheng Liu, Hao Li, Haiyu Xu, Baoqi Pei, Jiahao Wang, Mingxuan Zhao, Jingshu Zheng, Zheqi He, JG Yao, Bowen Qin, Xi Yang, Jiajun Zhang

PDF

Open Access

TL;DR

This paper introduces ReVeL, a framework that converts multiple-choice vision-language questions into open-form questions, improving robustness, data efficiency, and evaluation accuracy for multimodal models.

Contribution

ReVeL enables rewriting and verifying multiple-choice questions into open-form formats, enhancing robustness and revealing limitations of traditional MCQA benchmarks.

Findings

01

Models trained with ReVeL-OpenQA match MCQA accuracy on benchmarks

02

ReVeL improves OpenQA accuracy by about six percentage points

03

ReVeL reveals up to 20% score inflation in MCQA benchmarks

Abstract

Multiple-choice question answering (MCQA) has been a popular format for evaluating and reinforcement fine-tuning (RFT) of modern multimodal language models. Its constrained output format allows for simplified, deterministic automatic verification. However, we find that the options may leak exploitable signals, which makes the accuracy metrics unreliable for indicating real capabilities and encourages explicit or implicit answer guessing behaviors during RFT. We propose ReVeL (Rewrite and Verify by LLM), a framework that rewrites multiple-choice questions into open-form questions while keeping answers verifiable whenever possible. The framework categorizes questions according to different answer types, apply different rewriting and verification schemes, respectively. When applied for RFT, we converted 20k MCQA examples and use GRPO to finetune Qwen2.5-VL models. Models trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques