To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization

Haozhe Wang; Long Li; Chao Qu; Fengming Zhu; Weidi Xu; Wei Chu; Fangzhen Lin

arXiv:2502.00691·cs.AI·July 21, 2025

To Code or not to Code? Adaptive Tool Integration for Math Language Models via Expectation-Maximization

Haozhe Wang, Long Li, Chao Qu, Fengming Zhu, Weidi Xu, Wei Chu, Fangzhen Lin

PDF

Open Access

TL;DR

This paper introduces an EM-based framework enabling math language models to autonomously decide when and how to integrate code during reasoning, improving problem-solving performance without relying on fixed templates or external instructions.

Contribution

It proposes a novel EM approach that combines structured exploration with off-policy RL to enhance autonomous tool integration in math language models.

Findings

01

Achieves over 11% improvement on MATH500

02

Improves 9.4% on AIME without CoT

03

Demonstrates effective exploration in tool-use decisions

Abstract

Recent advances in mathematical problem-solving with language models (LMs) integrate chain-of-thought (CoT) reasoning and code execution to harness their complementary strengths. However, existing hybrid frameworks exhibit a critical limitation: they depend on externally dictated instructions or rigid code-integration templates, lacking metacognitive awareness -- the capacity to dynamically evaluate intrinsic capabilities and autonomously determine when and how to integrate tools. This rigidity motivates our study of autonomous code integration, enabling models to adapt tool-usage strategies as their reasoning abilities evolve during training. While reinforcement learning (RL) shows promise for boosting LLM reasoning at scale (e.g., DeepSeek-R1), we demonstrate its inefficiency in learning autonomous code integration due to inadequate exploration of the vast combinatorial space of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Technology and Assessment · Fuzzy Logic and Control Systems