Learning to Segment Moving Objects
Pavel Tokmakov, Cordelia Schmid, Karteek Alahari

TL;DR
This paper introduces a neural network framework with memory for segmenting moving objects in videos by leveraging motion, appearance, and temporal cues, achieving state-of-the-art results on multiple benchmarks.
Contribution
It presents a novel two-stream neural network with an explicit memory module for video object segmentation, integrating motion, appearance, and temporal consistency.
Findings
Effective segmentation of moving objects in unconstrained videos.
Outperforms existing methods on DAVIS, Freiburg, and SegTrack datasets.
Ablation studies highlight the importance of each component in the framework.
Abstract
We study the problem of segmenting moving objects in unconstrained videos. Given a video, the task is to segment all the objects that exhibit independent motion in at least one frame. We formulate this as a learning problem and design our framework with three cues: (i) independent object motion between a pair of frames, which complements object recognition, (ii) object appearance, which helps to correct errors in motion estimation, and (iii) temporal consistency, which imposes additional constraints on the segmentation. The framework is a two-stream neural network with an explicit memory module. The two streams encode appearance and motion cues in a video sequence respectively, while the memory module captures the evolution of objects over time, exploiting the temporal consistency. The motion stream is a convolutional neural network trained on synthetic videos to segment independently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
