Exploring Temporally Dynamic Data Augmentation for Video Recognition
Taeoh Kim, Jinhyung Kim, Minho Shim, Sangdoo Yun, Myunggu Kang,, Dongyoon Wee, Sangyoun Lee

TL;DR
This paper introduces DynaAugment, a novel data augmentation framework for video recognition that dynamically varies augmentation magnitudes over time using Fourier Sampling, leading to improved performance across diverse video tasks.
Contribution
The paper proposes DynaAugment, a simple yet effective method that models temporal variations in video augmentation, extending existing static methods with a Fourier Sampling mechanism and an expanded search space.
Findings
DynaAugment improves accuracy on multiple video datasets and tasks.
Temporal variation in augmentation enhances model robustness.
The method outperforms static augmentation approaches.
Abstract
Data augmentation has recently emerged as an essential component of modern training recipes for visual recognition tasks. However, data augmentation for video recognition has been rarely explored despite its effectiveness. Few existing augmentation recipes for video recognition naively extend the image augmentation methods by applying the same operations to the whole video frames. Our main idea is that the magnitude of augmentation operations for each frame needs to be changed over time to capture the real-world video's temporal variations. These variations should be generated as diverse as possible using fewer additional hyper-parameters during training. Through this motivation, we propose a simple yet effective video data augmentation framework, DynaAugment. The magnitude of augmentation operations on each frame is changed by an effective mechanism, Fourier Sampling that parameterizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
