ConvoLearn: A Learning Sciences Grounded Dataset for Fine-Tuning Dialogic AI Tutors

Mayank Sharma; Roy Pea; Hari Subramonyam

arXiv:2601.08950·cs.AI·April 13, 2026

ConvoLearn: A Learning Sciences Grounded Dataset for Fine-Tuning Dialogic AI Tutors

Mayank Sharma, Roy Pea, Hari Subramonyam

PDF

TL;DR

ConvoLearn is a new dataset of 2,134 semi-synthetic tutor-student dialogues grounded in knowledge-building theory, designed to improve dialogic AI tutors in education.

Contribution

The paper introduces ConvoLearn, a novel dataset for training dialogic AI tutors, and demonstrates its effectiveness in enhancing pedagogical dialogue behaviors.

Findings

01

Classifier trained on ConvoLearn correlates with expert-rated instructional quality.

02

Fine-tuning Mistral-7B on ConvoLearn yields dialogic behavior rated as competitive by teachers.

03

Dimension-labeled data captures meaningful pedagogical signals beyond semi-synthetic domain.

Abstract

Despite their growing adoption in education, LLMs remain misaligned with the core principle of effective tutoring: the dialogic construction of knowledge. We introduce ConvoLearn, a dataset of 2,134 semi-synthetic tutor-student dialogues operationalizing six dimensions of dialogic tutoring grounded in knowledge-building theory, situated in a middle school Earth Science curriculum. We show that dimension-labeled dialogic training data captures meaningful pedagogical signal that generalizes beyond its semi-synthetic domain: scores from a classifier trained on ConvoLearn correlate significantly with expert-coded instructional quality in authentic classrooms across multiple subscales. As a proof of concept, we fine-tune Mistral-7B on ConvoLearn and show that dimension-level fine-tuning can steer a 7B open-weight model toward dialogic tutoring behavior that credentialed teachers rate as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.