RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

Priya Sundaresan; Quan Vuong; Jiayuan Gu; Peng Xu; Ted Xiao; Sean; Kirmani; Tianhe Yu; Michael Stark; Ajinkya Jain; Karol Hausman; Dorsa Sadigh,; Jeannette Bohg; Stefan Schaal

arXiv:2403.02709·cs.RO·March 6, 2024·1 cites

RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

Priya Sundaresan, Quan Vuong, Jiayuan Gu, Peng Xu, Ted Xiao, Sean, Kirmani, Tianhe Yu, Michael Stark, Ajinkya Jain, Karol Hausman, Dorsa Sadigh,, Jeannette Bohg, Stefan Schaal

PDF

Open Access 3 Reviews

TL;DR

RT-Sketch introduces a novel goal-conditioned imitation learning approach using hand-drawn sketches, enabling robust manipulation in ambiguous or distractor-rich environments, and interprets varied sketch specificity.

Contribution

The paper proposes using hand-drawn sketches as goal representations in imitation learning, providing a flexible and spatially-aware alternative to language and images.

Findings

01

RT-Sketch performs comparably to image/language-conditioned agents in simple tasks.

02

RT-Sketch shows increased robustness with ambiguous language or visual distractors.

03

It can interpret sketches with different levels of detail.

Abstract

Natural language and images are commonly used as goal representations in goal-conditioned imitation learning (IL). However, natural language can be ambiguous and images can be over-specified. In this work, we propose hand-drawn sketches as a modality for goal specification in visual imitation learning. Sketches are easy for users to provide on the fly like language, but similar to images they can also help a downstream policy to be spatially-aware and even go beyond images to disambiguate task-relevant from task-irrelevant objects. We present RT-Sketch, a goal-conditioned policy for manipulation that takes a hand-drawn sketch of the desired scene as input, and outputs actions. We train RT-Sketch on a dataset of paired trajectories and corresponding synthetically generated goal sketches. We evaluate this approach on six manipulation skills involving tabletop object rearrangements on an…

Peer Reviews

Decision·CoRL 2024

Reviewer 01Rating 3Confidence 4

Reviewer 02Rating 3Confidence 4

Reviewer 03Rating 4Confidence 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Video Analysis and Summarization