Efficient Contextformer: Spatio-Channel Window Attention for Fast   Context Modeling in Learned Image Compression

A. Burakhan Koyuncu; Panqi Jia; Atanas Boev; Elena Alshina; Eckehard; Steinbach

arXiv:2306.14287·eess.IV·February 28, 2024

Efficient Contextformer: Spatio-Channel Window Attention for Fast Context Modeling in Learned Image Compression

A. Burakhan Koyuncu, Panqi Jia, Atanas Boev, Elena Alshina, Eckehard, Steinbach

PDF

Open Access

TL;DR

The paper introduces eContextformer, a highly efficient transformer-based context model for learned image compression that significantly reduces computational complexity and decoding time while improving compression performance.

Contribution

It proposes a novel, low-complexity transformer architecture with optimized attention mechanisms and training strategies for faster, more efficient learned image compression.

Findings

01

~145x lower model complexity compared to non-parallel methods

02

~210x faster decoding speed

03

Up to 17% bitrate savings over VVC intra coding

Abstract

Entropy estimation is essential for the performance of learned image compression. It has been demonstrated that a transformer-based entropy model is of critical importance for achieving a high compression ratio, however, at the expense of a significant computational effort. In this work, we introduce the Efficient Contextformer (eContextformer) - a computationally efficient transformer-based autoregressive context model for learned image compression. The eContextformer efficiently fuses the patch-wise, checkered, and channel-wise grouping techniques for parallel context modeling, and introduces a shifted window spatio-channel attention mechanism. We explore better training strategies and architectural designs and introduce additional complexity optimizations. During decoding, the proposed optimization techniques dynamically scale the attention span and cache the previous attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Advanced Image and Video Retrieval Techniques · Advanced Image Processing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings