Future Success Prediction in Open-Vocabulary Object Manipulation Tasks   Based on End-Effector Trajectories

Motonari Kambara; Komei Sugiura

arXiv:2412.19112·cs.RO·January 9, 2025

Future Success Prediction in Open-Vocabulary Object Manipulation Tasks Based on End-Effector Trajectories

Motonari Kambara, Komei Sugiura

PDF

Open Access

TL;DR

This paper presents a novel method for predicting the success of open-vocabulary object manipulation tasks using end-effector trajectories, natural language instructions, and egocentric images, enabling early outcome prediction before execution.

Contribution

Introduces Trajectory Encoder with learnable weighting for improved success prediction in open-vocabulary manipulation tasks, evaluated on a new dataset derived from RT-1.

Findings

01

Achieved higher prediction accuracy than baseline methods

02

Effectively models temporal dynamics and object interactions

03

Enables success prediction prior to task execution

Abstract

This study addresses a task designed to predict the future success or failure of open-vocabulary object manipulation. In this task, the model is required to make predictions based on natural language instructions, egocentric view images before manipulation, and the given end-effector trajectories. Conventional methods typically perform success prediction only after the manipulation is executed, limiting their efficiency in executing the entire task sequence. We propose a novel approach that enables the prediction of success or failure by aligning the given trajectories and images with natural language instructions. We introduce Trajectory Encoder to apply learnable weighting to the input trajectories, allowing the model to consider temporal dynamics and interactions between objects and the end effector, improving the model's ability to predict manipulation outcomes accurately. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling