Efficient Contextformer: Spatio-Channel Window Attention for Fast Context Modeling in Learned Image Compression
A. Burakhan Koyuncu, Panqi Jia, Atanas Boev, Elena Alshina, Eckehard, Steinbach

TL;DR
The paper introduces eContextformer, a highly efficient transformer-based context model for learned image compression that significantly reduces computational complexity and decoding time while improving compression performance.
Contribution
It proposes a novel, low-complexity transformer architecture with optimized attention mechanisms and training strategies for faster, more efficient learned image compression.
Findings
~145x lower model complexity compared to non-parallel methods
~210x faster decoding speed
Up to 17% bitrate savings over VVC intra coding
Abstract
Entropy estimation is essential for the performance of learned image compression. It has been demonstrated that a transformer-based entropy model is of critical importance for achieving a high compression ratio, however, at the expense of a significant computational effort. In this work, we introduce the Efficient Contextformer (eContextformer) - a computationally efficient transformer-based autoregressive context model for learned image compression. The eContextformer efficiently fuses the patch-wise, checkered, and channel-wise grouping techniques for parallel context modeling, and introduces a shifted window spatio-channel attention mechanism. We explore better training strategies and architectural designs and introduce additional complexity optimizations. During decoding, the proposed optimization techniques dynamically scale the attention span and cache the previous attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Image and Video Retrieval Techniques · Advanced Image Processing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
