Can AI Tools Transform Low-Demand Math Tasks? An Evaluation of Task Modification Capabilities
Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn

TL;DR
This study evaluates AI tools' ability to modify low-demand math tasks, revealing moderate success and highlighting challenges in reliably upgrading tasks to higher cognitive levels.
Contribution
It provides an empirical assessment of AI tools' effectiveness in task modification, comparing general-purpose and specialized tools in educational contexts.
Findings
AI tools upgraded tasks accurately 64% of the time
Performance varied from 33% to 88% among tools
Specialized tools only slightly outperformed general-purpose tools
Abstract
While recent research has explored AI tools' ability to classify the quality of mathematical tasks (arXiv:2603.03512), little is known about their capacity to increase the quality of existing tasks. This study investigated whether AI tools could successfully upgrade low-cognitive-demand mathematics tasks. Eleven tools were tested, including six broadly available, general-purpose AI tools (e.g., ChatGPT and Claude) and five tools specialized for mathematics teachers (e.g., Khanmigo, coteach.ai). Using the Task Analysis Guide framework (Stein & Smith, 1998), we prompted AI tools to modify two different types of low-demand mathematical tasks. The prompting strategy aimed to represent likely approaches taken by knowledgeable teachers, rather than extensive optimization to find a more effective prompt (i.e., an optimistic typical outcome). On average, AI tools were only moderately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
