TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models
Zheng Li, Siyao Song, Jingyuan Ma, Rui Li, Ying Zeng, Minghao Li, Zhifang Sui

TL;DR
This paper introduces TeachBench, a syllabus-grounded framework for evaluating large language models' teaching abilities by measuring student performance improvements across multiple subjects, revealing significant variation among models.
Contribution
The paper proposes a novel syllabus-based evaluation framework for LLM teaching ability, enabling structured assessment and comparison across models and domains.
Findings
Models vary significantly in teaching effectiveness across subjects.
Incorporating example problems does not always enhance teaching performance.
Teaching ability is a distinct, measurable aspect of LLM behavior.
Abstract
Large language models (LLMs) show promise as teaching assistants, yet their teaching capability remains insufficiently evaluated. Existing benchmarks mainly focus on problem-solving or problem-level guidance, leaving knowledge-centered teaching underexplored. We propose a syllabus-grounded evaluation framework that measures LLM teaching capability via student performance improvement after multi-turn instruction. By restricting teacher agents to structured knowledge points and example problems, the framework avoids information leakage and enables reuse of existing benchmarks. We instantiate the framework on Gaokao data across multiple subjects. Experiments reveal substantial variation in teaching effectiveness across models and domains: some models perform well in mathematics, while teaching remains challenging in physics and chemistry. We also find that incorporating example problems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Computational and Text Analysis Methods
