SlotGNN: Unsupervised Discovery of Multi-Object Representations and   Visual Dynamics

Alireza Rezazadeh; Athreyi Badithela; Karthik Desingh; Changhyun Choi

arXiv:2310.04617·cs.RO·October 10, 2023

SlotGNN: Unsupervised Discovery of Multi-Object Representations and Visual Dynamics

Alireza Rezazadeh, Athreyi Badithela, Karthik Desingh, Changhyun Choi

PDF

Open Access

TL;DR

This paper introduces SlotGNN, an unsupervised framework combining SlotTransport for object discovery and SlotGNN for predicting multi-object dynamics, demonstrating robustness and accuracy in both simulated and real-world robotic tasks.

Contribution

The paper presents two novel architectures, SlotTransport and SlotGNN, enabling unsupervised discovery of object representations and their dynamics from visual data in robotic environments.

Findings

01

SlotTransport accurately encodes visual and positional information of objects.

02

SlotGNN effectively predicts future states and dynamics of multi-object scenes.

03

Framework performs well in real-world robotic control tasks.

Abstract

Learning multi-object dynamics from visual data using unsupervised techniques is challenging due to the need for robust, object representations that can be learned through robot interactions. This paper presents a novel framework with two new architectures: SlotTransport for discovering object representations from RGB images and SlotGNN for predicting their collective dynamics from RGB images and robot interactions. Our SlotTransport architecture is based on slot attention for unsupervised object discovery and uses a feature transport mechanism to maintain temporal alignment in object-centric representations. This enables the discovery of slots that consistently reflect the composition of multi-object scenes. These slots robustly bind to distinct objects, even under heavy occlusion or absence. Our SlotGNN, a novel unsupervised graph-based dynamics model, predicts the future state of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Neural Network Applications