RoMo: Robust Motion Segmentation Improves Structure from Motion

Lily Goli; Sara Sabour; Mark Matthews; Marcus Brubaker; Dmitry Lagun,; Alec Jacobson; David J. Fleet; Saurabh Saxena; Andrea Tagliasacchi

arXiv:2411.18650·cs.CV·December 2, 2024

RoMo: Robust Motion Segmentation Improves Structure from Motion

Lily Goli, Sara Sabour, Mark Matthews, Marcus Brubaker, Dmitry Lagun,, Alec Jacobson, David J. Fleet, Saurabh Saxena, Andrea Tagliasacchi

PDF

Open Access

TL;DR

RoMo is a novel iterative method that combines optical flow, epipolar cues, and pre-trained segmentation to improve motion segmentation, thereby enhancing structure-from-motion and camera calibration in dynamic scenes.

Contribution

Introduces RoMo, a simple yet effective approach that significantly improves motion segmentation and camera calibration in dynamic scenes by integrating multiple cues with a pre-trained model.

Findings

01

Outperforms unsupervised and synthetic supervised baselines in motion segmentation.

02

Establishes new state-of-the-art in camera calibration for scenes with dynamic content.

03

Enhances SfM pipelines with robust motion segmentation.

Abstract

There has been extensive progress in the reconstruction and generation of 4D scenes from monocular casually-captured video. While these tasks rely heavily on known camera poses, the problem of finding such poses using structure-from-motion (SfM) often depends on robustly separating static from dynamic parts of a video. The lack of a robust solution to this problem limits the performance of SfM camera-calibration pipelines. We propose a novel approach to video-based motion segmentation to identify the components of a scene that are moving w.r.t. a fixed world frame. Our simple but effective iterative method, RoMo, combines optical flow and epipolar cues with a pre-trained video segmentation model. It outperforms unsupervised baselines for motion segmentation as well as supervised baselines trained from synthetic data. More importantly, the combination of an off-the-shelf SfM pipeline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Hand Gesture Recognition Systems · Human Pose and Action Recognition