Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT
Yesheng Liu, Hao Li, Haiyu Xu, Baoqi Pei, Jiahao Wang, Mingxuan Zhao, Jingshu Zheng, Zheqi He, JG Yao, Bowen Qin, Xi Yang, Jiajun Zhang

TL;DR
This paper introduces ReVeL, a framework that converts multiple-choice vision-language questions into open-form questions, improving robustness, data efficiency, and evaluation accuracy for multimodal models.
Contribution
ReVeL enables rewriting and verifying multiple-choice questions into open-form formats, enhancing robustness and revealing limitations of traditional MCQA benchmarks.
Findings
Models trained with ReVeL-OpenQA match MCQA accuracy on benchmarks
ReVeL improves OpenQA accuracy by about six percentage points
ReVeL reveals up to 20% score inflation in MCQA benchmarks
Abstract
Multiple-choice question answering (MCQA) has been a popular format for evaluating and reinforcement fine-tuning (RFT) of modern multimodal language models. Its constrained output format allows for simplified, deterministic automatic verification. However, we find that the options may leak exploitable signals, which makes the accuracy metrics unreliable for indicating real capabilities and encourages explicit or implicit answer guessing behaviors during RFT. We propose ReVeL (Rewrite and Verify by LLM), a framework that rewrites multiple-choice questions into open-form questions while keeping answers verifiable whenever possible. The framework categorizes questions according to different answer types, apply different rewriting and verification schemes, respectively. When applied for RFT, we converted 20k MCQA examples and use GRPO to finetune Qwen2.5-VL models. Models trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
