Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and   Reinforcement Learning

Alicia Li; Nishanth Kumar; Tom\'as Lozano-P\'erez; Leslie Kaelbling

arXiv:2409.19226·cs.RO·October 1, 2024

Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and Reinforcement Learning

Alicia Li, Nishanth Kumar, Tom\'as Lozano-P\'erez, Leslie Kaelbling

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning-based bridge policy that enables autonomous agents to adapt to environmental novelties by effectively combining learned policies with planning, improving long-horizon decision-making in unpredictable settings.

Contribution

The work proposes a novel RL formulation with a CallPlanner action that allows agents to learn when to switch control back to a planner, enhancing adaptability to environmental changes.

Findings

01

The approach outperforms baselines in adapting to novelties across multiple simulated domains.

02

The learned bridge policy generalizes to complex tasks with multiple novelties.

03

Rapid learning is achieved by leveraging the planner to avoid long-horizon exploration challenges.

Abstract

The real world is unpredictable. Therefore, to solve long-horizon decision-making problems with autonomous robots, we must construct agents that are capable of adapting to changes in the environment during deployment. Model-based planning approaches can enable robots to solve complex, long-horizon tasks in a variety of environments. However, such approaches tend to be brittle when deployed into an environment featuring a novel situation that their underlying model does not account for. In this work, we propose to learn a ``bridge policy'' via Reinforcement Learning (RL) to adapt to such novelties. We introduce a simple formulation for such learning, where the RL problem is constructed with a special ``CallPlanner'' action that terminates the bridge policy and hands control of the agent back to the planner. This allows the RL policy to learn the set of states in which querying the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research

MethodsSparse Evolutionary Training