Self-Supervised Representation Learning from Flow Equivariance
Yuwen Xiong, Mengye Ren, Wenyuan Zeng, Raquel Urtasun

TL;DR
This paper introduces a flow equivariance-based self-supervised learning framework that leverages complex video scenes with multiple moving objects to learn representations that outperform existing methods on various vision tasks.
Contribution
It proposes a novel flow equivariance objective for self-supervised learning directly from raw videos with multiple moving objects, improving downstream task performance.
Findings
Outperforms SimCLR and BYOL on semantic segmentation.
Effective on static image downstream tasks.
Learns from complex, high-resolution video scenes.
Abstract
Self-supervised representation learning is able to learn semantically meaningful features; however, much of its recent success relies on multiple crops of an image with very few objects. Instead of learning view-invariant representation from simple images, humans learn representations in a complex world with changing scenes by observing object movement, deformation, pose variation, and ego motion. Motivated by this ability, we present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes with many moving objects. Our framework features a simple flow equivariance objective that encourages the network to predict the features of another frame by applying a flow transformation to the features of the current frame. Our representations, learned from high-resolution raw video, can be readily used for downstream tasks on static…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
MethodsBootstrap Your Own Latent · Average Pooling · Batch Normalization · 1x1 Convolution · Max Pooling · Residual Connection · Residual Block · Global Average Pooling · Convolution · Bottleneck Residual Block
