Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation
Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen, Yaodong, Yang, Bo Xu, Lei Han

TL;DR
SEER is an efficient preference-based reinforcement learning method that reduces human feedback needs by integrating label smoothing and conservative Q-estimation, leading to improved performance and sample efficiency.
Contribution
The paper introduces SEER, a novel PbRL approach that enhances feedback efficiency through label smoothing and conservative Q-estimation, outperforming existing methods.
Findings
SEER outperforms state-of-the-art PbRL methods in complex tasks.
SEER achieves more accurate Q-function estimates.
SEER requires less human feedback for effective learning.
Abstract
Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems from the learning loop, which entails accurate reward learning compounded with value/policy learning, necessitating a considerable number of samples. To boost the learning loop, we propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques. Label smoothing reduces overfitting of the reward model by smoothing human preference labels. Additionally, we bootstrap a conservative estimate using well-supported state-action pairs from the current replay memory to mitigate overestimation bias and utilize it for policy learning regularization. Our experimental results across a variety of complex tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Energy Management · Data Stream Mining Techniques · Anomaly Detection Techniques and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Sigmoid Activation · LARS · Squeeze-and-Excitation Block · 1x1 Convolution · Dense Connections · Average Pooling · Global Average Pooling · Grouped Convolution · Batch Normalization
