How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning
Xiangxiang Zhang, Caijun Jia, Siyuan Li, Dingyu He, Xiya Xiong, Zheng Sun, Honghao He, Yuchen Wu, Bihui Yu, Linzhuang Sun, Cheng Tan, Jingxuan Wei

TL;DR
This paper introduces Faire, a reinforcement learning framework that improves geometric reasoning in multimodal models by internalizing the causal relationship between diagrams and reasoning steps, surpassing traditional fine-tuning methods.
Contribution
The paper proposes a novel RL-based approach, Faire, that enforces causal constraints to enhance the internalization of diagram-reasoning dependencies in multimodal models.
Findings
Faire achieves better reasoning performance on geometric benchmarks.
Supervised fine-tuning degrades reasoning by focusing on surface format.
Faire induces a qualitative shift, internalizing plotting and reasoning.
Abstract
Solving complex geometric problems inherently requires interleaved reasoning: a tight alternation between constructing diagrams and performing logical deductions. Although recent Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities in visual generation and plotting, we identify a counter-intuitive and underexplored phenomenon. Naively applying Supervised Fine-Tuning (SFT) on interleaved plot-solution data leads to a substantial degradation in reasoning performance compared to text-only baselines. We argue that this failure stems from a fundamental limitation of SFT, which primarily induces distributional alignment: the model learns to reproduce the surface format of interleaved plotting but fails to internalize the causal dependency between the generated plot and reasoning steps. To overcome this limitation, we propose Faire (Functional alignment for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
