BOSS: Benchmark for Observation Space Shift in Long-Horizon Task

Yue Yang; Linfeng Zhao; Mingyu Ding; Gedas Bertasius; Daniel Szafir

arXiv:2502.15679·cs.RO·February 24, 2025

BOSS: Benchmark for Observation Space Shift in Long-Horizon Task

Yue Yang, Linfeng Zhao, Mingyu Ding, Gedas Bertasius, Daniel Szafir

PDF

TL;DR

BOSS introduces a benchmark to evaluate the impact of Observation Space Shift on long-horizon robotic tasks, revealing significant performance drops and testing potential mitigation strategies.

Contribution

The paper presents BOSS, a benchmark for assessing Observation Space Shift in hierarchical robotic tasks, and evaluates recent imitation learning algorithms on this challenge.

Findings

01

Significant performance drops due to OSS across tested algorithms.

02

Scaling training data alone does not fully mitigate OSS.

03

BOSS provides a structured way to evaluate OSS effects in long-horizon tasks.

Abstract

Robotics has long sought to develop visual-servoing robots capable of completing previously unseen long-horizon tasks. Hierarchical approaches offer a pathway for achieving this goal by executing skill combinations arranged by a task planner, with each visuomotor skill pre-trained using a specific imitation learning (IL) algorithm. However, even in simple long-horizon tasks like skill chaining, hierarchical approaches often struggle due to a problem we identify as Observation Space Shift (OSS), where the sequential execution of preceding skills causes shifts in the observation space, disrupting the performance of subsequent individually trained skill policies. To validate OSS and evaluate its impact on long-horizon tasks, we introduce BOSS (a Benchmark for Observation Space Shift). BOSS comprises three distinct challenges: "Single Predicate Shift", "Accumulated Predicate Shift", and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training