Track, Check, Repeat: An EM Approach to Unsupervised Tracking
Adam W. Harley, Yiming Zuo, Jing Wen, Ayush Mangal, Shubhankar Potdar,, Ritwick Chaudhry, Katerina Fragkiadaki

TL;DR
This paper introduces an unsupervised 3D object detection and tracking method using an EM algorithm that iteratively refines detectors with pseudo-labels, achieving state-of-the-art results on challenging RGB-D videos.
Contribution
It presents a novel EM-based framework combining motion cues and appearance detectors for unsupervised tracking in 3D RGB-D videos, with iterative pseudo-label refinement.
Findings
Outperforms existing unsupervised methods on CATER and KITTI datasets.
Uses ensemble agreement to reduce pseudo-label contamination.
Employs data augmentation to improve detector generalization.
Abstract
We propose an unsupervised method for detecting and tracking moving objects in 3D, in unlabelled RGB-D videos. The method begins with classic handcrafted techniques for segmenting objects using motion cues: we estimate optical flow and camera motion, and conservatively segment regions that appear to be moving independently of the background. Treating these initial segments as pseudo-labels, we learn an ensemble of appearance-based 2D and 3D detectors, under heavy data augmentation. We use this ensemble to detect new instances of the "moving" type, even if they are not moving, and add these as new pseudo-labels. Our method is an expectation-maximization algorithm, where in the expectation step we fire all modules and look for agreement among them, and in the maximization step we re-train the modules to improve this agreement. The constraint of ensemble agreement helps combat…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
