Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng, Yangyang Guo, Liqiang Nie, Zhiyong Cheng and, Mohan Kankanhalli

TL;DR
This paper introduces a novel feature restoration method for sparse frame sampling in video action recognition, significantly improving efficiency with minimal accuracy loss and enhancing zero-shot generalization.
Contribution
It proposes a feature restoration technique for sparsely sampled frames that boosts efficiency and maintains accuracy across multiple datasets.
Findings
Efficiency improved by over 50% on baseline models.
Recognition accuracy reduced by only 0.5%.
Enhanced zero-shot generalization ability.
Abstract
Training an effective video action recognition model poses significant computational challenges, particularly under limited resource budgets. Current methods primarily aim to either reduce model size or utilize pre-trained models, limiting their adaptability to various backbone architectures. This paper investigates the issue of over-sampled frames, a prevalent problem in many approaches yet it has received relatively little attention. Despite the use of fewer frames being a potential solution, this approach often results in a substantial decline in performance. To address this issue, we propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames. This feature restoration technique brings a negligible increase in computational requirements compared to resource-intensive image encoders, such as ViT. To evaluate the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
