SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations

Peter Crowley; Zachary Serlin; Tyler Paine; Makai Mann; Michael Benjamin; Calin Belta

arXiv:2507.08707·cs.LG·July 14, 2025

SPLASH! Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations

Peter Crowley, Zachary Serlin, Tyler Paine, Makai Mann, Michael Benjamin, Calin Belta

PDF

TL;DR

SPLASH introduces a sample-efficient preference-based IRL method capable of learning long-horizon, adversarial tasks from suboptimal hierarchical demonstrations, outperforming existing approaches in simulation and real-world maritime scenarios.

Contribution

The paper presents SPLASH, a novel IRL framework that effectively learns from suboptimal, hierarchical demonstrations for complex long-horizon and adversarial tasks.

Findings

01

SPLASH outperforms state-of-the-art reward learning methods.

02

Effective in simulation for maritime capture-the-flag.

03

Demonstrates real-world applicability with autonomous surface vehicles.

Abstract

Inverse Reinforcement Learning (IRL) presents a powerful paradigm for learning complex robotic tasks from human demonstrations. However, most approaches make the assumption that expert demonstrations are available, which is often not the case. Those that allow for suboptimality in the demonstrations are not designed for long-horizon goals or adversarial tasks. Many desirable robot capabilities fall into one or both of these categories, thus highlighting a critical shortcoming in the ability of IRL to produce field-ready robotic agents. We introduce Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations (SPLASH), which advances the state-of-the-art in learning from suboptimal demonstrations to long-horizon and adversarial settings. We empirically validate SPLASH on a maritime capture-the-flag task in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.