Figure It Out: Improve the Frontier of Reasoning with Executable Visual States
Meiqi Chen, Fandong Meng, and Jie Zhou

TL;DR
The paper introduces FIGR, a method that enhances reasoning by integrating executable visual construction with reinforcement learning, improving performance on complex mathematical benchmarks beyond text-only models.
Contribution
FIGR is the first approach to incorporate executable visual diagrams into multi-turn reasoning, externalizing hypotheses to improve understanding of structural constraints.
Findings
Outperforms text-only baselines on eight mathematical benchmarks.
Achieves 13.12% improvement on AIME 2025.
Achieves 11.00% improvement on BeyondAIME.
Abstract
Complex reasoning problems often involve implicit spatial and geometric relationships that are not explicitly encoded in text. While recent reasoning models perform well across many domains, purely text-based reasoning struggles to capture structural constraints in complex settings. In this paper, we introduce FIGR, which integrates executable visual construction into multi-turn reasoning via end-to-end reinforcement learning. Rather than relying solely on textual chains of thought, FIGR externalizes intermediate hypotheses by generating executable code that constructs diagrams within the reasoning loop. An adaptive reward mechanism selectively regulates when visual construction is invoked, enabling more consistent reasoning over latent global properties that are difficult to infer from text alone. Experiments on eight challenging mathematical benchmarks demonstrate that FIGR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Topic Modeling
