CoDAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation

Shuzhou Yuan; William LaCroix; Hardik Ghoshal; Ercong Nie; Michael F\"arber

arXiv:2508.08386·cs.CL·August 13, 2025

CoDAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation

Shuzhou Yuan, William LaCroix, Hardik Ghoshal, Ercong Nie, Michael F\"arber

PDF

Open Access

TL;DR

This paper introduces CoDAE, a data augmentation framework using Chain-of-Thought prompting to adapt large language models for educational purposes, improving their reasoning, adaptivity, and resistance to manipulation.

Contribution

It presents a novel CoT-based data augmentation method to fine-tune LLMs for education, addressing key limitations like over-compliance and vulnerability.

Findings

01

Enhanced pedagogical guidance in fine-tuned models

02

Improved reasoning and response adaptivity

03

Increased resistance to manipulative prompts

Abstract

Large Language Models (LLMs) are increasingly employed as AI tutors due to their scalability and potential for personalized instruction. However, off-the-shelf LLMs often underperform in educational settings: they frequently reveal answers too readily, fail to adapt their responses to student uncertainty, and remain vulnerable to emotionally manipulative prompts. To address these challenges, we introduce CoDAE, a framework that adapts LLMs for educational use through Chain-of-Thought (CoT) data augmentation. We collect real-world dialogues between students and a ChatGPT-based tutor and enrich them using CoT prompting to promote step-by-step reasoning and pedagogically aligned guidance. Furthermore, we design targeted dialogue cases to explicitly mitigate three key limitations: over-compliance, low response adaptivity, and threat vulnerability. We fine-tune four open-source LLMs on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Artificial Intelligence in Healthcare and Education · Topic Modeling