"What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)
Noel Brindise, Vijeth Hebbar, Riya Shah, Cedric Langbort

TL;DR
This paper introduces Diverse Near-Optimal Alternatives (DNA), a method for generating diverse, near-optimal policies in reinforcement learning to improve explainability and offer multiple trajectory options for human understanding.
Contribution
The paper presents DNA, a novel approach that produces diverse, near-optimal policies using reward shaping and local Q-learning, enhancing explainability in RL.
Findings
DNA successfully generates qualitatively different policies.
DNA guarantees epsilon-optimality of the policies.
Comparison shows DNA's approach relates to Quality Diversity methods.
Abstract
In this work, we provide an extended discussion of a new approach to explainable Reinforcement Learning called Diverse Near-Optimal Alternatives (DNA), first proposed at L4DC 2025. DNA seeks a set of reasonable "options" for trajectory-planning agents, optimizing policies to produce qualitatively diverse trajectories in Euclidean space. In the spirit of explainability, these distinct policies are used to "explain" an agent's options in terms of available trajectory shapes from which a human user may choose. In particular, DNA applies to value function-based policies on Markov decision processes where agents are limited to continuous trajectories. Here, we describe DNA, which uses reward shaping in local, modified Q-learning problems to solve for distinct policies with guaranteed epsilon-optimality. We show that it successfully returns qualitatively different policies that constitute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Advanced Bandit Algorithms Research
MethodsQ-Learning · Sparse Evolutionary Training
