Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula
Li Lucy, Tal August, Rose E. Wang, Luca Soldaini, Courtney Allison,, Kyle Lo

TL;DR
This paper introduces datasets and evaluation tasks to assess language models' ability to understand and align with K-12 math standards, revealing current limitations in their reasoning and problem generation capabilities.
Contribution
It presents two new datasets linking math problems to educational standards and develops tasks to evaluate LMs' ability to verify and tag standards, highlighting their current shortcomings.
Findings
LMS struggle to accurately tag and verify math standards.
Models often generate problems that do not fully align with prompts.
Categorizing problems by standards helps understand difficulty levels.
Abstract
To ensure that math curriculum is grade-appropriate and aligns with critical skills or concepts in accordance with educational standards, pedagogical experts can spend months carefully reviewing published math problems. Drawing inspiration from this process, our work presents a novel angle for evaluating language models' (LMs) mathematical abilities, by investigating whether they can discern skills and concepts enabled by math content. We contribute two datasets: one consisting of 385 fine-grained descriptions of K-12 math skills and concepts, or standards, from Achieve the Core (ATC), and another of 9.9K math problems labeled with these standards (MathFish). We develop two tasks for evaluating LMs' abilities to assess math problems: (1) verifying whether a problem aligns with a given standard, and (2) tagging a problem with all aligned standards. Working with experienced teachers, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Natural Language Processing Techniques · Text Readability and Simplification
MethodsALIGN
