TL;DR
M2A introduces a parameter-space merging approach to enhance the synergy between mathematical and agentic reasoning in large language models, leading to improved reasoning depth and performance.
Contribution
The paper proposes a novel parameter-space merging method that combines mathematical and agentic reasoning without additional training, improving reasoning capabilities.
Findings
M2A improves reasoning depth in real-world coding tasks.
Applying M2A to Qwen3-8B increases the verified resolved rate from 44.0% to 51.2%.
The method requires no gradient updates and uses a simple merging coefficient.
Abstract
While reasoning has become a central capability of large language models (LLMs), the reasoning patterns required for different scenarios are often misaligned. Mathematical reasoning typically relies on intrinsic logic to solve closed-world problems in a single response, whereas agentic reasoning requires not only internal reasoning but also multi-turn interaction with external environments, interleaving thought and action. This misalignment prevents mathematical and agentic reasoning from effectively benefiting from each other, often yielding unstable reasoning behavior and only limited performance gains under multi-task learning. In this paper, we propose M2A, a novel paradigm that synergizes mathematical and agentic reasoning via model merging. To avoid overfitting to superficial reasoning patterns under joint training, M2A operates directly in parameter space: it identifies the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
