Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control
Siwei Ju, Jan Tauberschmidt, Oleg Arenz, Peter van Vliet, Jan Peters

TL;DR
This paper introduces a behavior-constrained reinforcement learning method with receding-horizon prediction, enabling high-performance control policies that outperform baselines while closely adhering to expert human behavior in complex dynamic tasks.
Contribution
The authors propose a novel reinforcement learning framework that explicitly models and constrains deviation from expert behavior using trajectory-level look-ahead rewards and reference trajectories.
Findings
Policies achieve competitive lap times in high-fidelity race car simulation.
Learned policies closely match expert driving behavior and outperform baselines.
Human evaluation confirms policies reproduce expert-like driving characteristics.
Abstract
Learning high-performance control policies that remain consistent with expert behavior is a fundamental challenge in robotics. Reinforcement learning can discover high-performing strategies but often departs from desirable human behavior, whereas imitation learning is limited by demonstration quality and struggles to improve beyond expert data. We propose a behavior-constrained reinforcement learning framework that improves beyond demonstrations while explicitly controlling deviation from expert behavior. Because expert-consistent behavior in dynamic control is inherently trajectory-level, we introduce a receding-horizon predictive mechanism that models short-term future trajectories and provides look-ahead rewards during training. To account for the natural variability of human behavior under disturbances and changing conditions, we further condition the policy on reference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
