MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Zhenwen Liang, Dian Yu, Wenhao Yu, Wenlin Yao, Zhihan Zhang,, Xiangliang Zhang, Dong Yu

TL;DR
This paper introduces MathChat, a benchmark for evaluating LLMs on multi-turn mathematical reasoning and instruction following, revealing current limitations and proposing a synthetic dialogue dataset for finetuning to enhance interactive capabilities.
Contribution
The paper presents MathChat, a new benchmark for multi-turn mathematical tasks, and introduces MathChatsync, a synthetic dataset for finetuning LLMs to improve their conversational reasoning skills.
Findings
SOTA LLMs perform well in single-turn math tasks but struggle in multi-turn interactions.
Finetuning with MathChatsync improves models' conversational reasoning abilities.
Highlighting the need for diverse instruction tuning datasets for better interactive math reasoning.
Abstract
Large language models (LLMs) have demonstrated impressive capabilities in mathematical problem solving, particularly in single turn question answering formats. However, real world scenarios often involve mathematical question answering that requires multi turn or interactive information exchanges, and the performance of LLMs on these tasks is still underexplored. This paper introduces MathChat, a comprehensive benchmark specifically designed to evaluate LLMs across a broader spectrum of mathematical tasks. These tasks are structured to assess the models' abilities in multiturn interactions and open ended generation. We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios that require sustained reasoning and dialogue understanding.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics Education and Teaching Techniques
