Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment
Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn

TL;DR
This paper presents ROAM, a method enabling robots to adapt quickly to new, unforeseen scenarios during deployment by selecting and modifying pre-trained behaviors without human intervention, demonstrated on quadruped robots including with roller skates.
Contribution
ROAM introduces a novel on-the-fly behavior adaptation mechanism based on perceived behavior value, allowing rapid, unsupervised adaptation during deployment in diverse scenarios.
Findings
ROAM adapts over 2x faster than existing methods.
It successfully adapts to changes in dynamics in simulation and real-world.
The method enables a quadruped robot to move with roller skates on its feet.
Abstract
To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previouslylearned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to select and adapt pre-trained behaviors to the situation at hand. Crucially, this adaptation process all happens within a single episode at test time, without any human supervision. We demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet. Our approach adapts over 2x as efficiently compared to existing methods when facing a variety of out-of-distribution situations…
Peer Reviews
Decision·Submitted to ICLR 2024
+ The problem of adaptation on the fly is important for robotics. + The proposed approach seems novel and is well-justified to address on-the-fly robot adaptation. + Experiments using real robots are a strength and well demonstrate the proposed method. + Comparison with existing methods is clear in the related work section.
- Figure 1 motivates the problem using examples of facing various terrain and robot failure (e.g., damaged leg), but no experiments were performed on real robots in these scenarios. - Showing on-the-fly adaptation across different scenarios (beyond dynamic payloads) could make the experiments more convincing, for example, in a scenario when a robot with a heavy payload suddenly steps on icy terrain.
- State-of-the-art performance compared to recent baseline methods. - Theoretical analysis included. - Simulation and real-world experiments conducted. - Ablation study included. - The work is well written and clear.
- The approach introduces an additional hyperparameter $\beta$ that must be tuned. I am also unaware of how sensitive the approach is to this hyperparameter (whether most selected values will work well and beat baselines or whether only a small handful are appropriate). - The approach somewhat changes the definition of the Bellman operator such that it also contains a notion of the policies propensity to have encountered a given state instead of being based solely on the expected cumulative rewa
1. The problem this paper is trying to solve is important: skill selection in previously unseen scenarios is challenging. Using values for selection is not novel (See, for example, Chen et al., Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation), but the way the overall selection method is novel to me. 2. While the case study is focused on adaptation to distribution shifts, the approach could be generalized to other skill selection problems, e.g., long-horizon task e
The major methodological weakness in the problem formulation is the bias induced by the proposed cross-entropy term. As proven by Theoreom 4.2, the increase in the value function is proportional to the state visitation frequency. This is a problem because the high-level policy will select the low-level policy, which mostly visited a state, not necessarily the best available policy. For example, a policy that resets almost immediately will visit a low neighborhood of the initial state and, theref
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Social Robot Interaction and HRI
