Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants
Shi Ding, Brian Magerko

TL;DR
This paper introduces TEACH-AI, a comprehensive framework and toolkit for evaluating generative AI in education, emphasizing human, ethical, and contextual factors beyond technical metrics.
Contribution
It presents a novel, stakeholder-aligned evaluation framework with measurable indicators, integrating pedagogical and sociotechnical considerations for educational AI systems.
Findings
TEACH-AI provides a ten-component assessment framework.
The toolkit supports scalable, value-aligned AI evaluation.
It encourages inclusive and impact-focused AI design in education.
Abstract
As generative artificial intelligence (AI) continues to transform education, most existing AI evaluations rely primarily on technical performance metrics such as accuracy or task efficiency while overlooking human identity, learner agency, contextual learning processes, and ethical considerations. In this paper, we present TEACH-AI (Trustworthy and Effective AI Classroom Heuristics), a domain-independent, pedagogically grounded, and stakeholder-aligned framework with measurable indicators and a practical toolkit for guiding the design, development, and evaluation of generative AI systems in educational contexts. Built on an extensive literature review and synthesis, the ten-component assessment framework and toolkit checklist provide a foundation for scalable, value-aligned AI evaluation in education. TEACH-AI rethinks "evaluation" through sociotechnical, educational, theoretical, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI
