From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench
Weikang Shi, Houxing Ren, Junting Pan, Aojun Zhou, Ke Wang, Zimu Lu, Yunqiao Yang, Yuxuan Hu, Linda Wei, Mingjie Zhan, Hongsheng Li

TL;DR
This paper introduces KMP-Bench, a comprehensive benchmark for evaluating the pedagogical abilities of large language models in K-8 mathematics tutoring, highlighting current models' strengths and weaknesses.
Contribution
The paper presents KMP-Bench, a novel multi-faceted benchmark with datasets and evaluation modules to assess LLMs' pedagogical skills in math tutoring.
Findings
Leading LLMs excel at verifiable tasks but struggle with pedagogical principles.
Fine-tuning on KMP-Pile improves models' pedagogical performance.
KMP-Bench reveals gaps in current LLMs' teaching capabilities.
Abstract
Large Language Models (LLMs) show significant potential in AI mathematical tutoring, yet current evaluations often rely on simplistic metrics or narrow pedagogical scenarios, failing to assess comprehensive, multi-turn teaching effectiveness. In this paper, we introduce KMP-Bench, a comprehensive K-8 Mathematical Pedagogical Benchmark designed to assess LLMs from two complementary perspectives. The first module, KMP-Dialogue, evaluates holistic pedagogical capabilities against six core principles (e.g., Challenge, Explanation, Feedback), leveraging a novel multi-turn dialogue dataset constructed by weaving together diverse pedagogical components. The second module, KMP-Skills, provides a granular assessment of foundational tutoring abilities, including multi-turn problem-solving, error detection and correction, and problem generation. Our evaluations on KMP-Bench reveal a key disparity:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Innovative Teaching and Learning Methods · Educational Assessment and Pedagogy
