Interpretable Preference-based Reinforcement Learning with   Tree-Structured Reward Functions

Tom Bewley; Freddy Lecue

arXiv:2112.11230·cs.LG·December 22, 2021·5 cites

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

Tom Bewley, Freddy Lecue

PDF

Open Access

TL;DR

This paper introduces an interpretable, tree-structured reward function learning method for preference-based reinforcement learning, enabling more transparent and robust alignment through active, sample-efficient feedback integration.

Contribution

It presents an online, active learning algorithm that constructs interpretable, compositional reward functions with a tree structure, improving interpretability and debugging in PbRL.

Findings

01

Sample-efficient learning of tree-structured rewards from synthetic and human feedback.

02

Enhanced interpretability facilitates exploration and debugging for alignment.

03

Demonstrated effectiveness across multiple environments.

Abstract

The potential of reinforcement learning (RL) to deliver aligned and performant agents is partially bottlenecked by the reward engineering problem. One alternative to heuristic trial-and-error is preference-based RL (PbRL), where a reward function is inferred from sparse human feedback. However, prior PbRL methods lack interpretability of the learned reward structure, which hampers the ability to assess robustness and alignment. We propose an online, active preference learning algorithm that constructs reward functions with the intrinsically interpretable, compositional structure of a tree. Using both synthetic and human-provided feedback, we demonstrate sample-efficient learning of tree-structured reward functions in several environments, then harness the enhanced interpretability to explore and debug for alignment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics