Boosting Object Representation Learning via Motion and Object Continuity

Quentin Delfosse; Wolfgang Stammer; Thomas Rothenbacher; Dwarak; Vittal; Kristian Kersting

arXiv:2211.09771·cs.CV·February 22, 2024

Boosting Object Representation Learning via Motion and Object Continuity

Quentin Delfosse, Wolfgang Stammer, Thomas Rothenbacher, Dwarak, Vittal, Kristian Kersting

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Motion and Object Continuity (MOC) scheme that enhances unsupervised multi-object detection by leveraging object motion and continuity, leading to better object representations and improved performance in downstream tasks like Atari game playing.

Contribution

The paper proposes a flexible MOC scheme that integrates optical flow and a contrastive loss to improve object representations without requiring new architectures.

Findings

01

Significant improvements in object discovery and convergence speed.

02

Enhanced latent object representations for downstream tasks.

03

Better performance in Atari game playing scenarios.

Abstract

Recent unsupervised multi-object detection models have shown impressive performance improvements, largely attributed to novel architectural inductive biases. Unfortunately, they may produce suboptimal object encodings for downstream tasks. To overcome this, we propose to exploit object motion and continuity, i.e., objects do not pop in and out of existence. This is accomplished through two mechanisms: (i) providing priors on the location of objects through integration of optical flow, and (ii) a contrastive object continuity loss across consecutive image frames. Rather than developing an explicit deep architecture, the resulting Motion and Object Continuity (MOC) scheme can be instantiated using any baseline object detection model. Our results show large improvements in the performances of a SOTA model in terms of object discovery, convergence speed and overall latent object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

k4ntz/moc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings