On the Benefits of Instance Decomposition in Video Prediction Models

Eliyas Suleyman; Paul Henderson; Nicolas Pugeault

arXiv:2501.10562·cs.CV·January 22, 2025

On the Benefits of Instance Decomposition in Video Prediction Models

Eliyas Suleyman, Paul Henderson, Nicolas Pugeault

PDF

Open Access

TL;DR

This paper demonstrates that explicitly decomposing scenes into objects in video prediction models improves prediction quality, especially when using latent-transformer architectures, based on experiments on synthetic and real datasets.

Contribution

It introduces the explicit object decomposition approach within latent-transformer video prediction models, showing its benefits over non-decomposed models.

Findings

01

Decomposition improves prediction quality.

02

Object-specific modeling captures independent motion patterns.

03

Results are consistent across synthetic and real datasets.

Abstract

Video prediction is a crucial task for intelligent agents such as robots and autonomous vehicles, since it enables them to anticipate and act early on time-critical incidents. State-of-the-art video prediction methods typically model the dynamics of a scene jointly and implicitly, without any explicit decomposition into separate objects. This is challenging and potentially sub-optimal, as every object in a dynamic scene has their own pattern of movement, typically somewhat independent of others. In this paper, we investigate the benefit of explicitly modeling the objects in a dynamic scene separately within the context of latent-transformer video prediction models. We conduct detailed and carefully-controlled experiments on both synthetic and real-world datasets; our results show that decomposing a dynamic scene leads to higher quality predictions compared with models of a similar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging