FoundationalASSIST: An Educational Dataset for Foundational Knowledge Tracing and Pedagogical Grounding of LLMs

Eamon Worden; Cristina Heffernan; Neil Heffernan; Shashank Sonkar

arXiv:2602.00070·cs.CY·February 3, 2026

FoundationalASSIST: An Educational Dataset for Foundational Knowledge Tracing and Pedagogical Grounding of LLMs

Eamon Worden, Cristina Heffernan, Neil Heffernan, Shashank Sonkar

PDF

Open Access 1 Datasets

TL;DR

FoundationalASSIST is a comprehensive educational dataset designed to evaluate and improve large language models' understanding of student learning, misconceptions, and pedagogical effectiveness, addressing limitations of previous datasets.

Contribution

This paper introduces FoundationalASSIST, the first detailed educational dataset with full question texts, student responses, and standards alignment, enabling new research on LLMs in education.

Findings

01

LLMs perform poorly on knowledge tracing tasks.

02

Models do not understand question diagnosticity.

03

Partial success in judging question difficulty.

Abstract

Can Large Language Models understand how students learn? As LLMs are deployed for adaptive testing and personalized tutoring, this question becomes urgent -- yet we cannot answer it with existing resources. Current educational datasets provide only question identifiers and binary correctness labels, rendering them opaque to LLMs that reason in natural language. We address this gap with FoundationalASSIST, the first English educational dataset providing the complete information needed for research on LLMs in education: full question text, actual student responses (not just right/wrong), records of which wrong answers students chose, and alignment to Common Core K-12 standards. These 1.7 million interactions from 5,000 students enable research directions that were previously impossible to pursue, from fine-tuning student models to analyzing misconception patterns. To demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ASSISTments/FoundationalASSIST
dataset· 61 dl
61 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Text Readability and Simplification