MRT: Learning Compact Representations with Mixed RWKV-Transformer for Extreme Image Compression
Han Liu, Hengyu Man, Xingtao Wang, Wenrui Li, Debin Zhao

TL;DR
This paper introduces MRT, a novel mixed RWKV-Transformer architecture that encodes images into highly compact 1-D latent representations, significantly improving extreme image compression efficiency over existing 2-D methods.
Contribution
The paper proposes a new Mixed RWKV-Transformer architecture that combines global and local attention mechanisms for more compact image representations, advancing the state-of-the-art in extreme image compression.
Findings
Achieves superior reconstruction quality at below 0.02 bpp.
Outperforms the SOTA GLC architecture with 43.75% and 30.59% bitrate savings on Kodak and CLIC2020 datasets.
Demonstrates effectiveness of 1-D latent representations in image compression.
Abstract
Recent advances in extreme image compression have revealed that mapping pixel data into highly compact latent representations can significantly improve coding efficiency. However, most existing methods compress images into 2-D latent spaces via convolutional neural networks (CNNs) or Swin Transformers, which tend to retain substantial spatial redundancy, thereby limiting overall compression performance. In this paper, we propose a novel Mixed RWKV-Transformer (MRT) architecture that encodes images into more compact 1-D latent representations by synergistically integrating the complementary strengths of linear-attention-based RWKV and self-attention-based Transformer models. Specifically, MRT partitions each image into fixed-size windows, utilizing RWKV modules to capture global dependencies across windows and Transformer blocks to model local redundancies within each window. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Image Processing Techniques · Image and Video Quality Assessment
