PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks

Jiankai Sun; De-An Huang; Bo Lu; Yun-Hui Liu; Bolei Zhou; Animesh Garg

arXiv:2109.04869·cs.RO·March 11, 2022

PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks

Jiankai Sun, De-An Huang, Bo Lu, Yun-Hui Liu, Bolei Zhou, Animesh Garg

PDF

Open Access 1 Repo

TL;DR

PlaTe introduces a transformer-based model that learns structured, goal-directed planning from instructional videos, effectively handling appearance gaps and reducing decision errors in procedural tasks.

Contribution

The paper presents PlaTe, a novel transformer-based approach that learns latent state-action representations from videos, improving long-term planning in procedural tasks.

Findings

01

Outperforms previous algorithms in goal-reaching tasks

02

Successfully applies to real-world instructional videos and interactive environments

03

Demonstrates feasibility on a UR-5 robotic platform

Abstract

In this work, we study the problem of how to leverage instructional videos to facilitate the understanding of human decision-making processes, focusing on training a model with the ability to plan a goal-directed procedure from real-world videos. Learning structured and plannable state and action spaces directly from unstructured videos is the key technical challenge of our task. There are two problems: first, the appearance gap between the training and validation datasets could be large for unstructured videos; second, these gaps lead to decision errors that compound over the steps. We address these limitations with Planning Transformer (PlaTe), which has the advantage of circumventing the compounding prediction errors that occur with single-step models during long model-based rollouts. Our method simultaneously learns the latent state and action information of assigned tasks and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiankai-sun/plate-pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Robot Manipulation and Learning