R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

Zirui Zhang; Haoyu Dong; Kexin Pei; Chengzhi Mao

arXiv:2603.25720·cs.AI·March 27, 2026

R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning

Zirui Zhang, Haoyu Dong, Kexin Pei, Chengzhi Mao

PDF

Open Access

TL;DR

This paper introduces RC2, a reinforcement learning framework that enforces cycle consistency across modalities to improve multimodal reasoning accuracy without relying on labeled data.

Contribution

The paper proposes a novel cycle-consistent reinforcement learning approach that aligns internal representations across modalities, enhancing reasoning performance.

Findings

01

Improves reasoning accuracy by up to 7.6 points.

02

Enables autonomous alignment of internal representations.

03

Reduces modality-specific errors.

Abstract

Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual representations of the same concept. Rather than masking these failures with standard voting mechanisms, which can amplify systematic biases, we show that cross-modal inconsistency provides a rich and natural signal for learning. We introduce RC2, a reinforcement learning framework that resolves internal conflicts by enforcing cross-modal cycle consistency. By requiring a model to perform backward inference, switch modalities, and reliably reconstruct the answer through forward inference, we obtain a dense, label-free reward. This cyclic constraint encourages the model to align its internal representations autonomously. Optimizing for this structure mitigates modality-specific errors and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Multisensory perception and integration · Action Observation and Synchronization