Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning

Siyu Song; Wentao Liu; Ye Lu; Ruohua Zhang; Tao Liu; Jinze Lv; Xinyun Wang; Aimin Zhou; Fei Tan; Bo Jiang; Hao Hao

arXiv:2507.20335·cs.LG·July 29, 2025

Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning

Siyu Song, Wentao Liu, Ye Lu, Ruohua Zhang, Tao Liu, Jinze Lv, Xinyun Wang, Aimin Zhou, Fei Tan, Bo Jiang, Hao Hao

PDF

7 Models

TL;DR

This paper introduces EduAlign, a framework that uses a multi-dimensional reward model to fine-tune large language models, making them more helpful, personalized, and creative as educational tutors.

Contribution

The paper presents a novel multi-dimensional reward model and a fine-tuning process to align LLMs with pedagogical principles, enhancing educational effectiveness.

Findings

01

Fine-tuned models show improved alignment with educational principles.

02

The reward model reliably scores LLM outputs on helpfulness, personalization, and creativity.

03

Experimental results demonstrate significant improvements over baseline models.

Abstract

The integration of large language models (LLMs) into education presents unprecedented opportunities for scalable personalized learning. However, standard LLMs often function as generic information providers, lacking alignment with fundamental pedagogical principles such as helpfulness, student-centered personalization, and creativity cultivation. To bridge this gap, we propose EduAlign, a novel framework designed to guide LLMs toward becoming more effective and responsible educational assistants. EduAlign consists of two main stages. In the first stage, we curate a dataset of 8k educational interactions and annotate them-both manually and automatically-along three key educational dimensions: Helpfulness, Personalization, and Creativity (HPC). These annotations are used to train HPC-RM, a multi-dimensional reward model capable of accurately scoring LLM outputs according to these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.