TL;DR
This paper introduces EpO-Net, a motion saliency method that leverages geometric constraints and a novel fusion mechanism to improve segmentation accuracy on challenging videos.
Contribution
It proposes trajectory-based epipolar distances and input-dropout to enhance motion saliency detection by explicitly modeling geometric cues and robustly combining motion and appearance features.
Findings
Outperforms previous state-of-the-art on DAVIS-2016 by 5.2% IoU
Effective fusion of motion and appearance features improves segmentation
Trajectory epipolar distances are data-independent and easy to compute
Abstract
The existing approaches for salient motion segmentation are unable to explicitly learn geometric cues and often give false detections on prominent static objects. We exploit multiview geometric constraints to avoid such shortcomings. To handle the nonrigid background like a sea, we also propose a robust fusion mechanism between motion and appearance-based features. We find dense trajectories, covering every pixel in the video, and propose trajectory-based epipolar distances to distinguish between background and foreground regions. Trajectory epipolar distances are data-independent and can be readily computed given a few features' correspondences between the images. We show that by combining epipolar distances with optical flow, a powerful motion network can be learned. Enabling the network to leverage both of these features, we propose a simple mechanism, we call input-dropout.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
