On the Effectiveness of Retrieval, Alignment, and Replay in Manipulation
Norman Di Palo, Edward Johns

TL;DR
This paper introduces a three-phase decomposition approach—retrieval, alignment, and replay—for imitation learning with visual observations, significantly improving efficiency and generalization in robotic manipulation tasks.
Contribution
The paper proposes a novel three-phase framework that enhances imitation learning efficiency and generalization in robotic manipulation tasks, demonstrated through real-world experiments.
Findings
Unprecedented learning efficiency achieved.
Effective generalization across object classes.
Successful real-world manipulation tasks.
Abstract
Imitation learning with visual observations is notoriously inefficient when addressed with end-to-end behavioural cloning methods. In this paper, we explore an alternative paradigm which decomposes reasoning into three phases. First, a retrieval phase, which informs the robot what it can do with an object. Second, an alignment phase, which informs the robot where to interact with the object. And third, a replay phase, which informs the robot how to interact with the object. Through a series of real-world experiments on everyday tasks, such as grasping, pouring, and inserting objects, we show that this decomposition brings unprecedented learning efficiency, and effective inter- and intra-class generalisation. Videos are available at https://www.robot-learning.uk/retrieval-alignment-replay.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
