IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization

Shuai Wang; Yaoming Yang; Bingdong Li; Hao Hao; Aimin Zhou

arXiv:2601.14686·cs.AI·January 22, 2026

IB-GRPO: Aligning LLM-based Learning Path Recommendation with Educational Objectives via Indicator-Based Group Relative Policy Optimization

Shuai Wang, Yaoming Yang, Bingdong Li, Hao Hao, Aimin Zhou

PDF

Open Access

TL;DR

This paper introduces IB-GRPO, a novel method that aligns large language model-based learning path recommendations with educational goals by using indicator-guided optimization and hybrid expert demonstrations.

Contribution

The paper proposes IB-GRPO, which effectively aligns LLM recommendations with pedagogical objectives using indicator-based optimization and hybrid demonstrations, addressing data scarcity and multi-objective trade-offs.

Findings

01

IB-GRPO outperforms RL and LLM baselines on ASSIST09 and Junyi datasets.

02

The method improves long-term learning effect and pedagogical alignment.

03

It demonstrates effective multi-objective optimization without manual scalarization.

Abstract

Learning Path Recommendation (LPR) aims to generate personalized sequences of learning items that maximize long-term learning effect while respecting pedagogical principles and operational constraints. Although large language models (LLMs) offer rich semantic understanding for free-form recommendation, applying them to long-horizon LPR is challenging due to (i) misalignment with pedagogical objectives such as the Zone of Proximal Development (ZPD) under sparse, delayed feedback, (ii) scarce and costly expert demonstrations, and (iii) multi-objective interactions among learning effect, difficulty scheduling, length controllability, and trajectory diversity. To address these issues, we propose IB-GRPO (Indicator-Based Group Relative Policy Optimization), an indicator-guided alignment approach for LLM-based LPR. To mitigate data scarcity, we construct hybrid expert demonstrations via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Recommender Systems and Techniques · Topic Modeling