FlowFeat: Pixel-Dense Embedding of Motion Profiles
Nikita Araslanov, Anna Sonnweber, Daniel Cremers

TL;DR
FlowFeat introduces a high-resolution, multi-task dense image representation that embeds motion profiles, improving performance across various dense prediction tasks with a self-supervised, computationally efficient approach.
Contribution
We propose FlowFeat, a novel dense feature embedding technique using a self-supervised distillation method to incorporate motion profiles, enhancing dense prediction tasks.
Findings
Significantly improves performance on video object segmentation, depth estimation, and semantic segmentation.
Effective even with unsupervised flow networks and low computational cost.
Provides high spatial detail and temporal consistency in dense image representations.
Abstract
Dense and versatile image representations underpin the success of virtually all computer vision applications. However, state-of-the-art networks, such as transformers, produce low-resolution feature grids, which are suboptimal for dense prediction tasks. To address this limitation, we present FlowFeat, a high-resolution and multi-task feature representation. The key ingredient behind FlowFeat is a novel distillation technique that embeds a distribution of plausible apparent motions, or motion profiles. By leveraging optical flow networks and diverse video data, we develop an effective self-supervised training framework that statistically approximates the apparent motion. With its remarkable level of spatial detail, FlowFeat encodes a compelling degree of geometric and semantic cues while exhibiting high temporal consistency. Empirically, FlowFeat significantly enhances the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Human Pose and Action Recognition
