Tell me why: Training preferences-based RL with human preferences and step-level explanations
Jakob Karalus

TL;DR
This paper introduces a preference-based reinforcement learning method that incorporates human-provided explanations for preferences, enhancing learning efficiency by allowing more expressive feedback over individual trajectory steps.
Contribution
It presents a novel approach enabling humans to give preference feedback along with step-level explanations, improving the learning process in preference-based RL.
Findings
Explanations improve learning speed.
Step-level feedback enhances preference accuracy.
Method outperforms traditional preference-based RL in simulations.
Abstract
Human-in-the-loop reinforcement learning allows the training of agents through various interfaces, even for non-expert humans. Recently, preference-based methods (PbRL), where the human has to give his preference over two trajectories, increased in popularity since they allow training in domains where more direct feedback is hard to formulate. However, the current PBRL methods have limitations and do not provide humans with an expressive interface for giving feedback. With this work, we propose a new preference-based learning method that provides humans with a more expressive interface to provide their preference over trajectories and a factual explanation (or annotation of why they have this preference). These explanations allow the human to explain what parts of the trajectory are most relevant for the preference. We allow the expression of the explanations over individual trajectory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Human-Automation Interaction and Safety
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
