Selective Feature Compression for Efficient Activity Recognition Inference
Chunhui Liu, Xinyu Li, Hao Chen, Davide Modolo, Joseph Tighe

TL;DR
This paper introduces Selective Feature Compression (SFC), a method that significantly improves the inference efficiency of action recognition models on trimmed videos by dropping non-informative features without sacrificing accuracy.
Contribution
The paper proposes a novel SFC strategy that compresses spatio-temporal features during inference, reducing computation and memory usage while maintaining or improving accuracy.
Findings
SFC reduces inference time by 6-7x and memory by 5-6x.
SFC slightly improves Top1 accuracy on multiple datasets.
SFC effectively learns to focus on important video regions.
Abstract
Most action recognition solutions rely on dense sampling to precisely cover the informative temporal clip. Extensively searching temporal region is expensive for a real-world application. In this work, we focus on improving the inference efficiency of current action recognition backbones on trimmed videos, and illustrate that one action model can also cover then informative region by dropping non-informative features. We present Selective Feature Compression (SFC), an action recognition inference strategy that greatly increase model inference efficiency without any accuracy compromise. Differently from previous works that compress kernel sizes and decrease the channel dimension, we propose to compress feature flow at spatio-temporal dimension without changing any backbone parameters. Our experiments on Kinetics-400, UCF101 and ActivityNet show that SFC is able to reduce inference speed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
