STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

Marius Memmel; Jacob Berg; Bingqing Chen; Abhishek Gupta; Jonathan Francis

arXiv:2412.15182·cs.RO·August 19, 2025

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

Marius Memmel, Jacob Berg, Bingqing Chen, Abhishek Gupta, Jonathan Francis

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

STRAP introduces a sub-trajectory retrieval method using pre-trained vision models and dynamic time warping to enhance robot policy adaptation and learning from large datasets, outperforming prior methods.

Contribution

This work presents a novel sub-trajectory retrieval approach that improves data utilization and policy robustness in robot learning by leveraging pre-trained models and dynamic time warping.

Findings

01

STRAP outperforms prior retrieval and multi-task learning methods.

02

It scales effectively to large offline datasets.

03

It enables robust control policies with few real-world demonstrations.

Abstract

Robot learning is witnessing a significant increase in the size, diversity, and complexity of pre-collected datasets, mirroring trends in domains such as natural language processing and computer vision. Many robot learning methods treat such datasets as multi-task expert data and learn a multi-task, generalist policy by training broadly across them. Notably, while these generalist policies can improve the average performance across many tasks, the performance of generalist policies on any one task is often suboptimal due to negative transfer between partitions of the data, compared to task-specific specialist policies. In this work, we argue for the paradigm of training policies during deployment given the scenarios they encounter: rather than deploying pre-trained policies to unseen problems in a zero-shot manner, we non-parametrically retrieve and train models directly on relevant…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

- The proposed method is well motivated and achieves strong results across multiple experiments, both in simulation and real-world settings - The use of Dynamic Time Warping for sub-trajectory matching is novel and well-suited for the problem domain - Comprehensive evaluation against recent retrieval baselines demonstrates the method's effectiveness - Thorough ablation studies on different pretrained encoders provide valuable insights into architecture choices - The paper is well written and

Weaknesses

- The baseline comparison against multi-task policy appears weak, as it only uses pretrained weights without fine-tuning. This seems like an artificially weak baseline since fine-tuning is standard practice for all MT-policies. - The paper's argument that retrieval is more efficient than expensive pretraining needs stronger empirical support, especially given that the robotics community regularly fine-tunes general policies for downstream tasks - The computational cost of STRAP's retrieval pr

Reviewer 02Rating 8Confidence 4

Strengths

- Sub-trajectory retrieval for the few-shot demo behavioral cloning setting is a well-motivated and novel idea. - The method is clear and straightforward to implement. - Results show that matching with DTW on vision foundation model features are robust to variations and capture task semantics. - Real and simulated environments show that STRAP outperforms other retrieval methods and pure behavioral cloning.

Weaknesses

- STRAP requires few-shot demos and model training at test time for a new task. - It would be good to see more sim and real environments for evaluations. - It would be more convincing to see a behavioral cloning baseline that uses all available data.

Reviewer 03Rating 6Confidence 4

Strengths

- To deal with potentially variable length during retrieval, STRAP use dynamic time warping (DTW) to match the sub-sequences - STRAP shows improved performance compared to the prior framework (Behavior Retrieval, Du et al., 2023) which retrieves single state-action pairs using VAE.

Weaknesses

- The idea of using only data relevant to the target task, rather than learning a generalist policy through multi-task data, is interesting. However, retrieving new data from prior dataset and training a policy each time a new scene is encountered is highly computationally costly. - Comparing the entire prior data with the target data one-to-one to measure similarity is not scalable with the dataset size. Moreover, since this retrieval process requires computationally intensive neural network op

Code & Models

Repositories

weirdlabuw/strap
pytorch

Videos

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning· slideslive

Taxonomy

TopicsTransportation and Mobility Innovations · Reinforcement Learning in Robotics