Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression
Hamidreza Soltani, Erfan Ghasemi

TL;DR
This paper introduces a novel Transformer-based image compression method that incorporates frequency-aware attention mechanisms and a mixed feed-forward network, significantly improving compression efficiency over existing learned image codecs.
Contribution
The paper proposes a hybrid spatial-channel attention Transformer with frequency-aware modules and a mixed local-global network, advancing the transformation stage in learned image compression.
Findings
Outperforms state-of-the-art LIC methods in rate-distortion metrics.
Enhances feature decorrelation through frequency-aware attention mechanisms.
Improves compression efficiency with novel Transformer architecture.
Abstract
Recent advancements in learned image compression (LIC) methods have demonstrated superior performance over traditional hand-crafted codecs. These learning-based methods often employ convolutional neural networks (CNNs) or Transformer-based architectures. However, these nonlinear approaches frequently overlook the frequency characteristics of images, which limits their compression efficiency. To address this issue, we propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map. Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies at the attention layer, and a Channel-aware Self-Attention (CaSA) module captures information across channels, significantly improving compression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques
MethodsLinear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections
