Learning Representational Invariances for Data-Efficient Action   Recognition

Yuliang Zou; Jinwoo Choi; Qitong Wang; Jia-Bin Huang

arXiv:2103.16565·cs.CV·November 21, 2022

Learning Representational Invariances for Data-Efficient Action Recognition

Yuliang Zou, Jinwoo Choi, Qitong Wang, Jia-Bin Huang

PDF

Open Access 1 Repo

TL;DR

This paper explores diverse data augmentation strategies for videos to improve action recognition, demonstrating their effectiveness in low-label and fully supervised settings across multiple datasets.

Contribution

It introduces novel video data augmentation techniques capturing various invariances and integrates them with semi-supervised learning frameworks for enhanced performance.

Findings

01

Improved accuracy on Kinetics-100/400, Mini-Something-v2, UCF-101, HMDB-51 datasets.

02

Effective augmentation strategies for photometric, geometric, temporal, and actor/scene invariances.

03

Enhanced performance in both low-label and fully supervised regimes.

Abstract

Data augmentation is a ubiquitous technique for improving image classification when labeled data is scarce. Constraining the model predictions to be invariant to diverse data augmentations effectively injects the desired representational invariances to the model (e.g., invariance to photometric variations) and helps improve accuracy. Compared to image data, the appearance variations in videos are far more complex due to the additional temporal dimension. Yet, data augmentation methods for videos remain under-explored. This paper investigates various data augmentation strategies that capture different video invariances, including photometric, geometric, temporal, and actor/scene augmentations. When integrated with existing semi-supervised learning frameworks, we show that our data augmentation strategy leads to promising performance on the Kinetics-100/400, Mini-Something-v2, UCF-101,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vt-vl-lab/video-data-aug
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications