AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition
Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid, Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris

TL;DR
AdaFuse is an adaptive temporal fusion network that improves video action recognition by dynamically combining current and past features, reducing computation by about 40% while maintaining accuracy.
Contribution
The paper introduces AdaFuse, a novel adaptive fusion method that efficiently models temporal information and reduces computational costs in action recognition.
Findings
Achieves approximately 40% computation savings.
Maintains comparable accuracy to state-of-the-art methods.
Validated on multiple datasets including Something V1 & V2, Jester, and Mini-Kinetics.
Abstract
Temporal modelling is the key for efficient video action recognition. While understanding temporal information can improve recognition accuracy for dynamic actions, removing temporal redundancy and reusing past features can significantly save computation leading to efficient action recognition. In this paper, we introduce an adaptive temporal fusion network, called AdaFuse, that dynamically fuses channels from current and past feature maps for strong temporal modelling. Specifically, the necessary information from the historical convolution feature maps is fused with current pruned feature maps with the goal of improving both recognition accuracy and efficiency. In addition, we use a skipping operation to further reduce the computation cost of action recognition. Extensive experiments on Something V1 & V2, Jester and Mini-Kinetics show that our approach can achieve about 40% computation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
MethodsDepthwise Convolution · Pointwise Convolution · Batch Normalization · Depthwise Separable Convolution · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · Sigmoid Activation · Ghost Module · Squeeze-and-Excitation Block
