Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

Adam W. Harley; Shrinidhi K. Lakshmikanth; Paul Schydlo; Katerina; Fragkiadaki

arXiv:2008.01295·cs.CV·August 5, 2020

Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

Adam W. Harley, Shrinidhi K. Lakshmikanth, Paul Schydlo, Katerina, Fragkiadaki

PDF

TL;DR

This paper introduces a neural 3D mapping approach that learns to track objects in complex scenes by leveraging static scene data and multiview correspondence, achieving unsupervised 3D object tracking.

Contribution

The authors propose a novel neural 3D mapping method that learns to track objects without supervision by using multiview static scene data and contrastive learning.

Findings

01

Outperforms prior unsupervised 2D and 2.5D trackers

02

Approaches the accuracy of supervised trackers

03

Demonstrates robustness to occlusions and camera motion

Abstract

We hypothesize that an agent that can look around in static scenes can learn rich visual representations applicable to 3D object tracking in complex dynamic scenes. We are motivated in this pursuit by the fact that the physical world itself is mostly static, and multiview correspondence labels are relatively cheap to collect in static scenes, e.g., by triangulation. We propose to leverage multiview data of \textit{static points} in arbitrary scenes (static or dynamic), to learn a neural 3D mapping module which produces features that are correspondable across time. The neural 3D mapper consumes RGB-D data as input, and produces a 3D voxel grid of deep features as output. We train the voxel features to be correspondable across viewpoints, using a contrastive loss, and correspondability across time emerges automatically. At test time, given an RGB-D video with approximate camera poses, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.