Large-scale Robustness Analysis of Video Action Recognition Models
Madeline Chantry Schiappa, Naman Biyani, Prudvi Kamtam, Shruti Vyas,, Hamid Palangi, Vibhav Vineet, Yogesh Rawat

TL;DR
This paper conducts a large-scale robustness analysis of video action recognition models, comparing CNN and transformer-based approaches against real-world distribution shifts using new benchmark datasets.
Contribution
It introduces four new benchmark datasets for robustness testing and provides comprehensive analysis of model robustness, highlighting the superior robustness of transformer models and the impact of pretraining.
Findings
Transformer models are more robust than CNN models.
Pretraining enhances robustness more for transformer models.
Models are generally robust to temporal perturbations except on SSv2.
Abstract
We have seen a great progress in video action recognition in recent years. There are several models based on convolutional neural network (CNN) and some recent transformer based approaches which provide top performance on existing benchmarks. In this work, we perform a large-scale robustness analysis of these existing models for video action recognition. We focus on robustness against real-world distribution shift perturbations instead of adversarial perturbations. We propose four different benchmark datasets, HMDB51-P, UCF101-P, Kinetics400-P, and SSv2-P to perform this analysis. We study robustness of six state-of-the-art action recognition models against 90 different perturbations. The study reveals some interesting findings, 1) transformer based models are consistently more robust compared to CNN based models, 2) Pretraining improves robustness for Transformer based models more than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Human Pose and Action Recognition
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Dense Connections · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing
