Decomposition Betters Tracking Everything Everywhere
Rui Li, Dong Liu

TL;DR
DecoMotion introduces a novel test-time optimization approach that decomposes videos into static and dynamic components for improved pixel-level motion estimation, robustness, and appearance decomposition.
Contribution
It proposes DecoMotion, a new method that explicitly decomposes video content into static and dynamic volumes for better long-range and occlusion-robust motion tracking.
Findings
Significantly improves point-tracking accuracy on TAP-Vid benchmark.
Performs comparably to state-of-the-art point-tracking methods.
Effectively handles occlusions and non-rigid deformations.
Abstract
Recent studies on motion estimation have advocated an optimized motion representation that is globally consistent across the entire video, preferably for every pixel. This is challenging as a uniform representation may not account for the complex and diverse motion and appearance of natural videos. We address this problem and propose a new test-time optimization method, named DecoMotion, for estimating per-pixel and long-range motion. DecoMotion explicitly decomposes video content into static scenes and dynamic objects, either of which uses a quasi-3D canonical volume to represent. DecoMotion separately coordinates the transformations between local and canonical spaces, facilitating an affine transformation for the static scene that corresponds to camera motion. For the dynamic volume, DecoMotion leverages discriminative and temporally consistent features to rectify the non-rigid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging
