Large-scale Robustness Analysis of Video Action Recognition Models

Madeline Chantry Schiappa; Naman Biyani; Prudvi Kamtam; Shruti Vyas,; Hamid Palangi; Vibhav Vineet; Yogesh Rawat

arXiv:2207.01398·cs.CV·April 10, 2023·1 cites

Large-scale Robustness Analysis of Video Action Recognition Models

Madeline Chantry Schiappa, Naman Biyani, Prudvi Kamtam, Shruti Vyas,, Hamid Palangi, Vibhav Vineet, Yogesh Rawat

PDF

Open Access 1 Repo

TL;DR

This paper conducts a large-scale robustness analysis of video action recognition models, comparing CNN and transformer-based approaches against real-world distribution shifts using new benchmark datasets.

Contribution

It introduces four new benchmark datasets for robustness testing and provides comprehensive analysis of model robustness, highlighting the superior robustness of transformer models and the impact of pretraining.

Findings

01

Transformer models are more robust than CNN models.

02

Pretraining enhances robustness more for transformer models.

03

Models are generally robust to temporal perturbations except on SSv2.

Abstract

We have seen a great progress in video action recognition in recent years. There are several models based on convolutional neural network (CNN) and some recent transformer based approaches which provide top performance on existing benchmarks. In this work, we perform a large-scale robustness analysis of these existing models for video action recognition. We focus on robustness against real-world distribution shift perturbations instead of adversarial perturbations. We propose four different benchmark datasets, HMDB51-P, UCF101-P, Kinetics400-P, and SSv2-P to perform this analysis. We study robustness of six state-of-the-art action recognition models against 90 different perturbations. The study reveals some interesting findings, 1) transformer based models are consistently more robust compared to CNN based models, 2) Pretraining improves robustness for Transformer based models more than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Maddy12/ActionRecognitionRobustnessEval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Human Pose and Action Recognition

MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Dense Connections · Byte Pair Encoding · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing