A Survey on Backbones for Deep Video Action Recognition

Zixuan Tang; Youjun Zhao; Yuhang Wen; Mengyuan Liu

arXiv:2405.05584·cs.CV·May 10, 2024

A Survey on Backbones for Deep Video Action Recognition

Zixuan Tang, Youjun Zhao, Yuhang Wen, Mengyuan Liu

PDF

Open Access

TL;DR

This survey reviews diverse deep learning backbones for video action recognition, including two-stream, 3D CNN, and transformer-based methods, highlighting their architectures, challenges, and future directions.

Contribution

It provides a comprehensive overview of current deep neural network backbones for action recognition, comparing their approaches and identifying research gaps.

Findings

01

Two-stream networks utilize RGB and optical flow modalities.

02

3D CNNs directly extract motion features from RGB videos.

03

Transformer-based models introduce NLP techniques into video understanding.

Abstract

Action recognition is a key technology in building interactive metaverses. With the rapid development of deep learning, methods in action recognition have also achieved great advancement. Researchers design and implement the backbones referring to multiple standpoints, which leads to the diversity of methods and encountering new challenges. This paper reviews several action recognition methods based on deep neural networks. We introduce these methods in three parts: 1) Two-Streams networks and their variants, which, specifically in this paper, use RGB video frame and optical flow modality as input; 2) 3D convolutional networks, which make efforts in taking advantage of RGB modality directly while extracting different motion information is no longer necessary; 3) Transformer-based methods, which introduce the model from natural language processing into computer vision and video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods