Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition

Yulin Wang; Haoji Zhang; Yang Yue; Shiji Song; Chao Deng; Junlan Feng,; Gao Huang

arXiv:2412.11228·cs.CV·December 17, 2024

Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition

Yulin Wang, Haoji Zhang, Yang Yue, Shiji Song, Chao Deng, Junlan Feng,, Gao Huang

PDF

Open Access 1 Repo

TL;DR

Uni-AdaFocus introduces a unified spatial-temporal dynamic computation framework for video recognition, significantly improving efficiency by adaptively focusing on relevant regions, frames, and samples, and integrating with existing backbones.

Contribution

It proposes a comprehensive adaptive computation framework that integrates spatial, temporal, and sample-wise redundancies for efficient video recognition.

Findings

01

Achieves higher computational efficiency on seven benchmark datasets.

02

Compatible with off-the-shelf backbones like TSM and X3D.

03

Outperforms competitive baselines in efficiency.

Abstract

This paper presents a comprehensive exploration of the phenomenon of data redundancy in video understanding, with the aim to improve computational efficiency. Our investigation commences with an examination of spatial redundancy, which refers to the observation that the most informative region in each video frame usually corresponds to a small image patch, whose shape, size and location shift smoothly across frames. Motivated by this phenomenon, we formulate the patch localization problem as a dynamic decision task, and introduce a spatially adaptive video recognition approach, termed AdaFocus. In specific, a lightweight encoder is first employed to quickly process the full video sequence, whose features are then utilized by a policy network to identify the most task-relevant regions. Subsequently, the selected patches are inferred by a high-capacity deep network for the final…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leaplabthu/uni-adafocus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization