Representing Positional Information in Generative World Models for   Object Manipulation

Stefano Ferraro; Pietro Mazzaglia; Tim Verbelen; Bart Dhoedt; Sai; Rajeswar

arXiv:2409.12005·cs.RO·September 20, 2024

Representing Positional Information in Generative World Models for Object Manipulation

Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Sai, Rajeswar

PDF

Open Access

TL;DR

This paper enhances generative world models with explicit positional representations to improve object manipulation, enabling more accurate goal achievement in robotics tasks.

Contribution

It introduces position-conditioned and latent-conditioned approaches that explicitly encode object positions, improving manipulation success over existing models.

Findings

01

LCP captures object positions effectively for manipulation tasks.

02

Methods outperform current model-based control approaches.

03

Enables multimodal goal specification through spatial or visual cues.

Abstract

Object manipulation capabilities are essential skills that set apart embodied agents engaging with the world, especially in the realm of robotics. The ability to predict outcomes of interactions with objects is paramount in this setting. While model-based control methods have started to be employed for tackling manipulation tasks, they have faced challenges in accurately manipulating objects. As we analyze the causes of this limitation, we identify the cause of underperformance in the way current world models represent crucial positional information, especially about the target's goal specification for object positioning tasks. We introduce a general approach that empowers world model-based agents to effectively solve object-positioning tasks. We propose two declinations of this approach for generative world models: position-conditioned (PCP) and latent-conditioned (LCP) policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Robotics and Automated Systems · Human Motion and Animation

MethodsSparse Evolutionary Training