Selecting Decision-Relevant Concepts in Reinforcement Learning

Naveen Raman; Stephanie Milani; Fei Fang

arXiv:2604.04808·cs.LG·April 7, 2026

Selecting Decision-Relevant Concepts in Reinforcement Learning

Naveen Raman, Stephanie Milani, Fei Fang

PDF

TL;DR

This paper introduces the first algorithms for automatic selection of decision-relevant concepts in reinforcement learning, improving interpretability and performance without manual intervention.

Contribution

It proposes a novel decision-relevant concept selection algorithm based on state abstraction, with theoretical performance bounds and empirical validation.

Findings

01

DRS recovers manually curated concept sets

02

DRS matches or exceeds manual sets in performance

03

DRS enhances test-time concept interventions

Abstract

Training interpretable concept-based policies requires practitioners to manually select which human-understandable concepts an agent should reason with when making sequential decisions. This selection demands domain expertise, is time-consuming and costly, scales poorly with the number of candidates, and provides no performance guarantees. To overcome this limitation, we propose the first algorithms for principled automatic concept selection in sequential decision-making. Our key insight is that concept selection can be viewed through the lens of state abstraction: intuitively, a concept is decision-relevant if removing it would cause the agent to confuse states that require different actions. As a result, agents should rely on decision-relevant concepts; states with the same concept representation should share the same optimal action, which preserves the optimal decision structure of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.