LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning
Zhuorui Ye, Stephanie Milani, Geoffrey J. Gordon, Fei Fang

TL;DR
LICORICE introduces a method for training interpretable reinforcement learning agents that minimizes human labeling effort by actively selecting data points for concept annotation, maintaining performance while reducing annotation costs.
Contribution
The paper presents LICORICE, a novel training scheme enabling concept-based RL with minimal annotations, using active data selection and concept decorrelation to reduce labeling efforts.
Findings
Reduces human labeling to 500 or fewer concept labels in simple environments.
Achieves similar performance with 5000 or fewer labels in complex environments.
Effectively uses VLMs as automated concept annotators in some cases.
Abstract
Recent advances in reinforcement learning (RL) have predominantly leveraged neural network policies for decision-making, yet these models often lack interpretability, posing challenges for stakeholder comprehension and trust. Concept bottleneck models offer an interpretable alternative by integrating human-understandable concepts into policies. However, prior work assumes that concept annotations are readily available during training. For RL, this requirement poses a significant limitation: it necessitates continuous real-time concept annotation, which either places an impractical burden on human annotators or incurs substantial costs in API queries and inference time when employing automated labeling methods. To overcome this limitation, we introduce a novel training scheme that enables RL agents to efficiently learn a concept-based policy by only querying annotators to label a small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Statistical and Computational Modeling · Data Stream Mining Techniques
MethodsSparse Evolutionary Training
