AdaFuse: Adaptive Temporal Fusion Network for Efficient Action   Recognition

Yue Meng; Rameswar Panda; Chung-Ching Lin; Prasanna Sattigeri; Leonid; Karlinsky; Kate Saenko; Aude Oliva; Rogerio Feris

arXiv:2102.05775·cs.CV·February 12, 2021·21 cites

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid, Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris

PDF

Open Access 1 Video

TL;DR

AdaFuse is an adaptive temporal fusion network that improves video action recognition by dynamically combining current and past features, reducing computation by about 40% while maintaining accuracy.

Contribution

The paper introduces AdaFuse, a novel adaptive fusion method that efficiently models temporal information and reduces computational costs in action recognition.

Findings

01

Achieves approximately 40% computation savings.

02

Maintains comparable accuracy to state-of-the-art methods.

03

Validated on multiple datasets including Something V1 & V2, Jester, and Mini-Kinetics.

Abstract

Temporal modelling is the key for efficient video action recognition. While understanding temporal information can improve recognition accuracy for dynamic actions, removing temporal redundancy and reusing past features can significantly save computation leading to efficient action recognition. In this paper, we introduce an adaptive temporal fusion network, called AdaFuse, that dynamically fuses channels from current and past feature maps for strong temporal modelling. Specifically, the necessary information from the historical convolution feature maps is fused with current pruned feature maps with the goal of improving both recognition accuracy and efficiency. In addition, we use a skipping operation to further reduce the computation cost of action recognition. Extensive experiments on Something V1 & V2, Jester and Mini-Kinetics show that our approach can achieve about 40% computation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications

MethodsDepthwise Convolution · Pointwise Convolution · Batch Normalization · Depthwise Separable Convolution · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · Sigmoid Activation · Ghost Module · Squeeze-and-Excitation Block