Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks
Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn

TL;DR
This study assesses the accuracy of various AI tools in classifying the cognitive demand of mathematical tasks, revealing moderate performance and highlighting areas for improvement in educational AI applications.
Contribution
It provides a comprehensive evaluation of AI tools' ability to categorize cognitive demand in math tasks, comparing general-purpose and education-specific models.
Findings
AI tools achieved 63% average accuracy in classification.
No tool exceeded 83% accuracy, with biases toward middle categories.
Tools struggled with tasks at the extremes of cognitive demand.
Abstract
Teachers face increasing demands on their time, particularly in adapting mathematics curricula to meet individual student needs while maintaining cognitive rigor. This study evaluates whether AI tools can accurately classify the cognitive demand of mathematical tasks, which is important for creating or adapting tasks that support student learning. We tested eleven AI tools: six general-purpose (ChatGPT, Claude, DeepSeek, Gemini, Grok, Perplexity) and five education-specific (Brisk, Coteach AI, Khanmigo, Magic School, SchoolAI), on their ability to categorize mathematics tasks across four levels of cognitive demand using a research-based framework. The goal was to approximate the performance teachers will achieve with straightforward prompts. On average, AI tools accurately classified cognitive demand in only 63% of cases. Education-specific tools were not more accurate than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Cognitive and developmental aspects of mathematical skills · Teaching and Learning Programming
