ExTra: Transfer-guided Exploration
Anirban Santara, Rishabh Madan, Balaraman Ravindran, Pabitra Mitra

TL;DR
ExTra introduces a transfer-guided exploration method in reinforcement learning that leverages optimal policies from related tasks to improve convergence rates and robustness across different environments.
Contribution
The paper proposes a novel transfer-guided exploration method using bisimulation distances to guide action sampling, enhancing convergence and robustness in RL.
Findings
ExTra outperforms traditional exploration strategies in gridworlds.
ExTra is robust to source task dissimilarity.
Combining ExTra with traditional methods improves convergence.
Abstract
In this work we present a novel approach for transfer-guided exploration in reinforcement learning that is inspired by the human tendency to leverage experiences from similar encounters in the past while navigating a new task. Given an optimal policy in a related task-environment, we show that its bisimulation distance from the current task-environment gives a lower bound on the optimal advantage of state-action pairs in the current task-environment. Transfer-guided Exploration (ExTra) samples actions from a Softmax distribution over these lower bounds. In this way, actions with potentially higher optimum advantage are sampled more frequently. In our experiments on gridworld environments, we demonstrate that given access to an optimal policy in a related task-environment, ExTra can outperform popular domain-specific exploration strategies viz. epsilon greedy, Model-Based Interval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsSoftmax
