Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control

Siwei Ju; Jan Tauberschmidt; Oleg Arenz; Peter van Vliet; Jan Peters

arXiv:2604.03023·cs.RO·April 6, 2026

Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control

Siwei Ju, Jan Tauberschmidt, Oleg Arenz, Peter van Vliet, Jan Peters

PDF

TL;DR

This paper introduces a behavior-constrained reinforcement learning method with receding-horizon prediction, enabling high-performance control policies that outperform baselines while closely adhering to expert human behavior in complex dynamic tasks.

Contribution

The authors propose a novel reinforcement learning framework that explicitly models and constrains deviation from expert behavior using trajectory-level look-ahead rewards and reference trajectories.

Findings

01

Policies achieve competitive lap times in high-fidelity race car simulation.

02

Learned policies closely match expert driving behavior and outperform baselines.

03

Human evaluation confirms policies reproduce expert-like driving characteristics.

Abstract

Learning high-performance control policies that remain consistent with expert behavior is a fundamental challenge in robotics. Reinforcement learning can discover high-performing strategies but often departs from desirable human behavior, whereas imitation learning is limited by demonstration quality and struggles to improve beyond expert data. We propose a behavior-constrained reinforcement learning framework that improves beyond demonstrations while explicitly controlling deviation from expert behavior. Because expert-consistent behavior in dynamic control is inherently trajectory-level, we introduce a receding-horizon predictive mechanism that models short-term future trajectories and provides look-ahead rewards during training. To account for the natural variability of human behavior under disturbances and changing conditions, we further condition the policy on reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.