Frequency Selective Augmentation for Video Representation Learning
Jinhyung Kim, Taeoh Kim, Minho Shim, Dongyoon Han, Dongyoon Wee and, Junmo Kim

TL;DR
FreqAug is a novel frequency domain augmentation technique that enhances self-supervised video learning by encouraging models to focus on dynamic features, leading to improved performance across multiple downstream tasks.
Contribution
The paper introduces FreqAug, a frequency-based augmentation method that reduces static information bias in video representations, applicable across various self-supervised learning frameworks.
Findings
Consistent performance improvements on five action recognition tasks.
Enhanced focus on dynamic features in learned representations.
Effective across multiple self-supervised frameworks.
Abstract
Recent self-supervised video representation learning methods focus on maximizing the similarity between multiple augmented views from the same video and largely rely on the quality of generated views. However, most existing methods lack a mechanism to prevent representation learning from bias towards static information in the video. In this paper, we propose frequency augmentation (FreqAug), a spatio-temporal data augmentation method in the frequency domain for video representation learning. FreqAug stochastically removes specific frequency components from the video so that learned representation captures essential features more from the remaining information for various downstream tasks. Specifically, FreqAug pushes the model to focus more on dynamic features rather than static features in the video via dropping spatial or temporal low-frequency components. To verify the generality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Video Surveillance and Tracking Methods
