LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement   Learning

Zhuorui Ye; Stephanie Milani; Geoffrey J. Gordon; Fei Fang

arXiv:2407.15786·cs.LG·March 21, 2025

LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning

Zhuorui Ye, Stephanie Milani, Geoffrey J. Gordon, Fei Fang

PDF

Open Access

TL;DR

LICORICE introduces a method for training interpretable reinforcement learning agents that minimizes human labeling effort by actively selecting data points for concept annotation, maintaining performance while reducing annotation costs.

Contribution

The paper presents LICORICE, a novel training scheme enabling concept-based RL with minimal annotations, using active data selection and concept decorrelation to reduce labeling efforts.

Findings

01

Reduces human labeling to 500 or fewer concept labels in simple environments.

02

Achieves similar performance with 5000 or fewer labels in complex environments.

03

Effectively uses VLMs as automated concept annotators in some cases.

Abstract

Recent advances in reinforcement learning (RL) have predominantly leveraged neural network policies for decision-making, yet these models often lack interpretability, posing challenges for stakeholder comprehension and trust. Concept bottleneck models offer an interpretable alternative by integrating human-understandable concepts into policies. However, prior work assumes that concept annotations are readily available during training. For RL, this requirement poses a significant limitation: it necessitates continuous real-time concept annotation, which either places an impractical burden on human annotators or incurs substantial costs in API queries and inference time when employing automated labeling methods. To overcome this limitation, we introduce a novel training scheme that enables RL agents to efficiently learn a concept-based policy by only querying annotators to label a small…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Statistical and Computational Modeling · Data Stream Mining Techniques

MethodsSparse Evolutionary Training