TL;DR
This paper introduces 'steady' feature analysis, a method that extends slow feature analysis by enforcing higher order temporal coherence in video representations, leading to improved recognition performance.
Contribution
It proposes a novel regularizer for neural networks that enforces smooth higher order derivatives in feature space, capturing more detailed temporal dynamics in unlabeled videos.
Findings
Outperforms standard slow feature analysis in recognition tasks
Features learned from unlabeled video can surpass supervised pretraining
Effective across diverse datasets like YouTube and KITTI
Abstract
How can unlabeled video augment visual learning? Existing methods perform "slow" feature analysis, encouraging the representations of temporally close frames to exhibit only small differences. While this standard approach captures the fact that high-level visual signals change slowly over time, it fails to capture *how* the visual content changes. We propose to generalize slow feature analysis to "steady" feature analysis. The key idea is to impose a prior that higher order derivatives in the learned feature space must be small. To this end, we train a convolutional neural network with a regularizer on tuples of sequential frames from unlabeled video. It encourages feature changes over time to be smooth, i.e., similar to the most recent changes. Using five diverse datasets, including unlabeled YouTube and KITTI videos, we demonstrate our method's impact on object, scene, and action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video· youtube
