Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning
Chris Tava

TL;DR
This paper presents a reinforcement learning method for autonomous agents to adaptively switch between navigation strategies, significantly improving efficiency and robustness in maze navigation tasks without prior domain-specific heuristics.
Contribution
It introduces a Q-learning based approach enabling agents to learn optimal switching thresholds between exploration and goal-directed policies during runtime.
Findings
Adaptive switching outperforms fixed thresholds and single-strategy agents.
Performance improvements scale with maze complexity, up to 55% in larger mazes.
The learned policy generalizes to unseen maze configurations within each size class.
Abstract
Autonomous agents often require multiple strategies to solve complex tasks, but determining when to switch between strategies remains challenging. This research introduces a reinforcement learning technique to learn switching thresholds between two orthogonal navigation policies. Using maze navigation as a case study, this work demonstrates how an agent can dynamically transition between systematic exploration (coverage) and goal-directed pathfinding (convergence) to improve task performance. Unlike fixed-threshold approaches, the agent uses Q-learning to adapt switching behavior based on coverage percentage and distance to goal, requiring only minimal domain knowledge: maze dimensions and target location. The agent does not require prior knowledge of wall positions, optimal threshold values, or hand-crafted heuristics; instead, it discovers effective switching strategies dynamically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
