TREX: Trajectory Explanations for Multi-Objective Reinforcement Learning
Dilina Rajapakse, Juan C. Rosero, Ivana Dusparic

TL;DR
TREX is a framework that explains multi-objective reinforcement learning policies by analyzing trajectory segments and their influence on trade-offs, enhancing interpretability of complex decision-making models.
Contribution
The paper introduces TREX, a novel trajectory attribution method for explaining multi-objective RL policies with user-preference considerations.
Findings
Successfully isolates behavioral patterns in multi-objective environments
Quantifies influence of behavioral segments on Pareto trade-offs
Demonstrates effectiveness on MuJoCo benchmarks
Abstract
Reinforcement Learning (RL) has demonstrated its ability to solve complex decision-making problems in a variety of domains, by optimizing reward signals obtained through interaction with an environment. However, many real-world scenarios involve multiple, potentially conflicting objectives that cannot be easily represented by a single scalar reward. Multi-Objective Reinforcement Learning (MORL) addresses this limitation by enabling agents to optimize several objectives simultaneously, explicitly reasoning about trade-offs between them. However, the ``black box" nature of the RL models makes the decision process behind chosen objective trade-offs unclear. Current Explainable Reinforcement Learning (XRL) methods are typically designed for single scalar rewards and do not account for explanations with respect to distinct objectives or user preferences. To address this gap, in this paper we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
