Selective Volume Mixup for Video Action Recognition

Yi Tan; Zhaofan Qiu; Yanbin Hao; Ting Yao; Tao Mei

arXiv:2309.09534·cs.CV·October 23, 2024·2 cites

Selective Volume Mixup for Video Action Recognition

Yi Tan, Zhaofan Qiu, Yanbin Hao, Ting Yao, Tao Mei

PDF

Open Access 1 Repo

TL;DR

This paper introduces Selective Volume Mixup (SV-Mix), a novel video augmentation technique that adaptively combines informative video volumes to enhance model generalization on small datasets.

Contribution

The paper proposes a learnable selective augmentation strategy with spatial and temporal modules, optimized jointly with recognition models to improve video classification performance.

Findings

01

SV-Mix improves accuracy on multiple benchmarks.

02

It benefits both CNN and transformer models.

03

It enhances generalization on small datasets.

Abstract

The recent advances in Convolutional Neural Networks (CNNs) and Vision Transformers have convincingly demonstrated high learning capability for video action recognition on large datasets. Nevertheless, deep models often suffer from the overfitting effect on small-scale datasets with a limited number of training videos. A common solution is to exploit the existing image augmentation strategies for each frame individually including Mixup, Cutmix, and RandAugment, which are not particularly optimized for video data. In this paper, we propose a novel video augmentation strategy named Selective Volume Mixup (SV-Mix) to improve the generalization ability of deep models with limited training videos. SV-Mix devises a learnable selective module to choose the most informative volumes from two videos and mixes the volumes up to achieve a new training video. Technically, we propose two new modules,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ty-97/seletive-volume-mix
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Stroke Rehabilitation and Recovery

MethodsMixup · RandAugment