The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts
Dorottya Demszky, Heather Hill

TL;DR
This paper introduces the NCTE Transcripts dataset, the largest collection of elementary math classroom transcripts, enabling research on discourse analysis and instructional improvement through NLP models.
Contribution
It provides a comprehensive, annotated dataset of elementary math classroom transcripts with rich metadata, facilitating analysis of discourse and learning outcomes.
Findings
NLP models can identify dialogic discourse moves.
Dialogic moves correlate with better classroom scores.
Dataset supports research in education and policy.
Abstract
Classroom discourse is a core medium of instruction - analyzing it can provide a window into teaching and learning as well as driving the development of new tools for improving instruction. We introduce the largest dataset of mathematics classroom transcripts available to researchers, and demonstrate how this data can help improve instruction. The dataset consists of 1,660 45-60 minute long 4th and 5th grade elementary mathematics observations collected by the National Center for Teacher Effectiveness (NCTE) between 2010-2013. The anonymized transcripts represent data from 317 teachers across 4 school districts that serve largely historically marginalized students. The transcripts come with rich metadata, including turn-level annotations for dialogic discourse moves, classroom observation scores, demographic information, survey responses and student test scores. We demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Assessment and Improvement · Reading and Literacy Development
MethodsTest
