Ranking-Aware Calibration for Reliable Multimodal Reinforcement Learning

Peng Cui; Boyao Yang; Jun Zhu

arXiv:2605.16999·cs.LG·May 19, 2026

Ranking-Aware Calibration for Reliable Multimodal Reinforcement Learning

Peng Cui, Boyao Yang, Jun Zhu

PDF

TL;DR

This paper introduces Ranking-Aware Calibration (RAC), a training framework for multimodal reinforcement learning that improves model calibration and accuracy by using ranking and corruption-based confidence supervision without external annotations.

Contribution

The paper proposes RAC, a novel training-time calibration method that leverages ranking signals and visual evidence degradation to enhance confidence calibration and reasoning accuracy.

Findings

01

RAC significantly improves task accuracy across multiple benchmarks.

02

The pairwise corruption loss reduces calibration error under degraded inputs.

03

Combining both losses achieves the best calibration and often improves accuracy.

Abstract

Reinforcement learning post-training has substantially improved the reasoning accuracy of vision-language models, yet the resulting policies remain poorly calibrated. Terminal correctness rewards provide no gradient that penalizes confident errors more than uncertain ones and no signal that ties confidence to the quality of visual evidence, a gap that becomes especially severe under corrupted or ambiguous inputs where models continue to report high confidence on incorrect answers. We introduce Ranking-Aware Calibration (RAC), a training-time framework that supervises confidence using two comparison signals that group-based RL already produces at no additional labeling cost. The ranking-aware group loss enforces that a better rollout receives higher confidence than a worse one within the same prompt. The clean--corrupted pairwise loss enforces that confidence attenuates as visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.