Sim-Anchored Learning for On-the-Fly Adaptation
Bassel El Mabsout, Shahin Roozkhosh, Siddharth Mysore, Kate Saenko,, Renato Mancuso

TL;DR
This paper introduces a multi-objective optimization approach using anchor critics to enable real-time adaptation of RL agents, preserving important behaviors learned in simulation while adapting to real-world data, demonstrated on robotics tasks.
Contribution
It proposes a novel multi-objective framework with anchor critics for on-the-fly RL adaptation, addressing catastrophic forgetting and behavior preservation.
Findings
Robust behavior retention in sim-to-sim benchmarks.
Successful sim-to-real adaptation with power savings in quadrotor.
Open-source SwaNNFlight firmware for live adaptation.
Abstract
Fine-tuning simulation-trained RL agents with real-world data often degrades crucial behaviors due to limited or skewed data distributions. We argue that designer priorities exist not just in reward functions, but also in simulation design choices like task selection and state initialization. When adapting to real-world data, agents can experience catastrophic forgetting in important but underrepresented scenarios. We propose framing live-adaptation as a multi-objective optimization problem, where policy objectives must be satisfied both in simulation and reality. Our approach leverages critics from simulation as "anchors for design intent" (anchor critics). By jointly optimizing policies against both anchor critics and critics trained on real-world experience, our method enables adaptation while preserving prioritized behaviors from simulation. Evaluations demonstrate robust behavior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Age of Information Optimization · Advanced Bandit Algorithms Research
