GATSBI: Generative Agent-centric Spatio-temporal Object Interaction

Cheol-Hui Min; Jinseok Bae; Junho Lee; Young Min Kim

arXiv:2104.04275·cs.CV·April 12, 2021

GATSBI: Generative Agent-centric Spatio-temporal Object Interaction

Cheol-Hui Min, Jinseok Bae, Junho Lee, Young Min Kim

PDF

Open Access 1 Repo

TL;DR

GATSBI is a generative model that creates structured, agent-centric spatio-temporal representations from raw observations, enabling better scene understanding and prediction in complex, dynamic environments.

Contribution

It introduces an unsupervised object-centric approach that models interactions among entities, improving scene decomposition and future state prediction in vision-based scenarios.

Findings

01

Outperforms state-of-the-art in scene decomposition

02

Achieves superior video prediction accuracy

03

Generalizes across diverse environments

Abstract

We present GATSBI, a generative model that can transform a sequence of raw observations into a structured latent representation that fully captures the spatio-temporal context of the agent's actions. In vision-based decision-making scenarios, an agent faces complex high-dimensional observations where multiple entities interact with each other. The agent requires a good scene representation of the visual observation that discerns essential components and consistently propagates along the time horizon. Our method, GATSBI, utilizes unsupervised object-centric scene representation learning to separate an active agent, static background, and passive objects. GATSBI then models the interactions reflecting the causal relationships among decomposed entities and predicts physically plausible future states. Our model generalizes to a variety of environments where different types of robots and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mch5048/gatsbi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition