Differentiable Resolution Compression and Alignment for Efficient Video   Classification and Retrieval

Rui Deng; Qian Wu; Yuke Li; Haoran Fu

arXiv:2309.08167·cs.CV·September 18, 2023

Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval

Rui Deng, Qian Wu, Yuke Li, Haoran Fu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a differentiable resolution compression and alignment method for efficient video classification and retrieval, reducing computation while maintaining accuracy through a novel transformer-based approach.

Contribution

It proposes a differentiable compression and alignment mechanism that adaptively reduces resolution and captures global temporal correlations, improving efficiency without sacrificing performance.

Findings

01

Achieves the best efficiency-performance trade-off on near-duplicate video retrieval.

02

Provides competitive results on dynamic video classification.

03

Enables end-to-end optimization of the video representation network.

Abstract

Optimizing video inference efficiency has become increasingly important with the growing demand for video analysis in various fields. Some existing methods achieve high efficiency by explicit discard of spatial or temporal information, which poses challenges in fast-changing and fine-grained scenarios. To address these issues, we propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism, which compresses non-essential information in the early stage of the network to reduce computational costs while maintaining consistent temporal correlations. Specifically, we leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features, refining and updating the features into a high-low resolution video sequence. To process the new sequence, we introduce a new Resolution-Align Transformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dun-research/drca
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Advanced Image Processing Techniques

MethodsAttention Is All You Need · Softmax · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · Residual Connection · Adam · Multi-Head Attention · Layer Normalization