Sim-Anchored Learning for On-the-Fly Adaptation

Bassel El Mabsout; Shahin Roozkhosh; Siddharth Mysore; Kate Saenko,; Renato Mancuso

arXiv:2301.06987·cs.RO·May 2, 2025

Sim-Anchored Learning for On-the-Fly Adaptation

Bassel El Mabsout, Shahin Roozkhosh, Siddharth Mysore, Kate Saenko,, Renato Mancuso

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-objective optimization approach using anchor critics to enable real-time adaptation of RL agents, preserving important behaviors learned in simulation while adapting to real-world data, demonstrated on robotics tasks.

Contribution

It proposes a novel multi-objective framework with anchor critics for on-the-fly RL adaptation, addressing catastrophic forgetting and behavior preservation.

Findings

01

Robust behavior retention in sim-to-sim benchmarks.

02

Successful sim-to-real adaptation with power savings in quadrotor.

03

Open-source SwaNNFlight firmware for live adaptation.

Abstract

Fine-tuning simulation-trained RL agents with real-world data often degrades crucial behaviors due to limited or skewed data distributions. We argue that designer priorities exist not just in reward functions, but also in simulation design choices like task selection and state initialization. When adapting to real-world data, agents can experience catastrophic forgetting in important but underrepresented scenarios. We propose framing live-adaptation as a multi-objective optimization problem, where policy objectives must be satisfied both in simulation and reality. Our approach leverages critics from simulation as "anchors for design intent" (anchor critics). By jointly optimizing policies against both anchor critics and critics trained on real-world experience, our method enables adaptation while preserving prioritized behaviors from simulation. Evaluations demonstrate robust behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bu-cyber-physical-systems-lab/swannfs
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Age of Information Optimization · Advanced Bandit Algorithms Research