Hierarchical Apprenticeship Learning from Imperfect Demonstrations with Evolving Rewards

Md Mirajul Islam; Rajesh Debnath; Adittya Soukarjya Saha; Min Chi

arXiv:2604.00258·cs.LG·April 2, 2026

Hierarchical Apprenticeship Learning from Imperfect Demonstrations with Evolving Rewards

Md Mirajul Islam, Rajesh Debnath, Adittya Soukarjya Saha, Min Chi

PDF

TL;DR

HALIDE is a hierarchical apprenticeship learning method that effectively leverages and ranks imperfect, evolving student demonstrations to better infer pedagogical strategies and reward functions.

Contribution

The paper introduces HALIDE, a novel hierarchical framework that models and ranks imperfect student demonstrations with evolving rewards, improving pedagogical decision prediction.

Findings

01

HALIDE outperforms existing methods in predicting student pedagogical decisions.

02

Incorporating demonstration ranking improves reward inference accuracy.

03

Hierarchical modeling captures higher-level student intent from suboptimal actions.

Abstract

While apprenticeship learning has shown promise for inducing effective pedagogical policies directly from student interactions in e-learning environments, most existing approaches rely on optimal or near-optimal expert demonstrations under a fixed reward. Real-world student interactions, however, are often inherently imperfect and evolving: students explore, make errors, revise strategies, and refine their goals as understanding develops. In this work, we argue that imperfect student demonstrations are not noise to be discarded, but structured signals-provided their relative quality is ranked. We introduce HALIDE, Hierarchical Apprenticeship Learning from Imperfect Demonstrations with Evolving Rewards, which not only leverages sub-optimal student demonstrations, but ranks them within a hierarchical learning framework. HALIDE models student behavior at multiple levels of abstraction,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.