CARE What Fails: Contrastive Anchored-REflection for Verifiable Multimodal Reasoning
Yongxin Wang, Zhicheng Yang, Meng Cao, Mingfei Han, Haokun Lin, Yingying Zhu, Xiaojun Chang, Xiaodan Liang

TL;DR
CARE introduces a novel failure-centric framework for multimodal reasoning that leverages errors as supervision, significantly improving accuracy and training stability on visual-reasoning benchmarks.
Contribution
The paper proposes CARE, a new method combining contrastive objectives and self-repair to enhance learning from failures in multimodal reasoning tasks.
Findings
Improves accuracy by 4.6 points on Qwen2.5-VL-7B benchmarks.
Achieves state-of-the-art results on MathVista and MMMU-Pro.
Enhances training smoothness and learning from failures.
Abstract
Group-relative reinforcement learning with verifiable rewards (RLVR) often wastes the most informative data it already has the failures. When all rollouts are wrong, gradients stall; when one happens to be correct, the update usually ignores why the others are close-but-wrong, and credit can be misassigned to spurious chains. We present CARE (Contrastive Anchored REflection), a failure-centric post-training framework for multimodal reasoning that turns errors into supervision. CARE combines: (i) an anchored-contrastive objective that forms a compact subgroup around the best rollout and a set of semantically proximate hard negatives, performs within-subgroup z-score normalization with negative-only scaling, and includes an all-negative rescue to prevent zero-signal batches; and (ii) Reflection-Guided Resampling (RGR), a one-shot structured self-repair that rewrites a representative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
