Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval
Rui Deng, Qian Wu, Yuke Li, Haoran Fu

TL;DR
This paper introduces a differentiable resolution compression and alignment method for efficient video classification and retrieval, reducing computation while maintaining accuracy through a novel transformer-based approach.
Contribution
It proposes a differentiable compression and alignment mechanism that adaptively reduces resolution and captures global temporal correlations, improving efficiency without sacrificing performance.
Findings
Achieves the best efficiency-performance trade-off on near-duplicate video retrieval.
Provides competitive results on dynamic video classification.
Enables end-to-end optimization of the video representation network.
Abstract
Optimizing video inference efficiency has become increasingly important with the growing demand for video analysis in various fields. Some existing methods achieve high efficiency by explicit discard of spatial or temporal information, which poses challenges in fast-changing and fine-grained scenarios. To address these issues, we propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism, which compresses non-essential information in the early stage of the network to reduce computational costs while maintaining consistent temporal correlations. Specifically, we leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features, refining and updating the features into a high-low resolution video sequence. To process the new sequence, we introduce a new Resolution-Align Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Advanced Image Processing Techniques
MethodsAttention Is All You Need · Softmax · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · Residual Connection · Adam · Multi-Head Attention · Layer Normalization
