To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization
Haozhe Wang, Long Li, Chao Qu, Fengming Zhu, Weidi Xu, Wei Chu, Fangzhen Lin

TL;DR
This paper introduces an EM-based framework enabling math language models to autonomously decide when and how to integrate code during reasoning, improving problem-solving performance without relying on fixed templates or external instructions.
Contribution
It proposes a novel EM approach that combines structured exploration with off-policy RL to enhance autonomous tool integration in math language models.
Findings
Achieves over 11% improvement on MATH500
Improves 9.4% on AIME without CoT
Demonstrates effective exploration in tool-use decisions
Abstract
Recent advances in mathematical problem-solving with language models (LMs) integrate chain-of-thought (CoT) reasoning and code execution to harness their complementary strengths. However, existing hybrid frameworks exhibit a critical limitation: they depend on externally dictated instructions or rigid code-integration templates, lacking metacognitive awareness -- the capacity to dynamically evaluate intrinsic capabilities and autonomously determine when and how to integrate tools. This rigidity motivates our study of autonomous code integration, enabling models to adapt tool-usage strategies as their reasoning abilities evolve during training. While reinforcement learning (RL) shows promise for boosting LLM reasoning at scale (e.g., DeepSeek-R1), we demonstrate its inefficiency in learning autonomous code integration due to inadequate exploration of the vast combinatorial space of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment · Fuzzy Logic and Control Systems
