TL;DR
This paper introduces Explanation-Aware Experience Replay (XAER), a method that organizes experience buffers based on rule-based explanations to improve reinforcement learning in rule-dense environments like autonomous driving.
Contribution
It proposes a novel experience replay technique that leverages explainable rules to enhance learning efficiency and performance in complex, rule-rich environments.
Findings
XAER outperforms traditional prioritized experience replay methods.
Explanation engineering can substitute reward engineering in environments with explainable features.
The method is validated across multiple navigation environments and learning tasks.
Abstract
Human environments are often regulated by explicit and complex rulesets. Integrating Reinforcement Learning (RL) agents into such environments motivates the development of learning mechanisms that perform well in rule-dense and exception-ridden environments such as autonomous driving on regulated roads. In this paper, we propose a method for organising experience by means of partitioning the experience buffer into clusters labelled on a per-explanation basis. We present discrete and continuous navigation environments compatible with modular rulesets and 9 learning tasks. For environments with explainable rulesets, we convert rule-based explanations into case-based explanations by allocating state-transitions into clusters labelled with explanations. This allows us to sample experiences in a curricular and task-oriented manner, focusing on the rarity, importance, and meaning of events.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Q-Learning · Adam · Experience Replay · Dense Connections · 1x1 Convolution · Clipped Double Q-learning · Deep Q-Network · Dilated Convolution · Target Policy Smoothing
