Transformer-based Image Compression
Ming Lu, Peiyao Guo, Huiqing Shi, Chuntong Cao, and Zhan Ma

TL;DR
This paper introduces a Transformer-based image compression method that leverages a VAE architecture with novel neural transformation units and attention modules, achieving competitive performance with fewer parameters.
Contribution
The paper presents a new Transformer-based image compression framework using NTUs and a casual attention module, reducing model size while maintaining high compression quality.
Findings
Outperforms state-of-the-art CNN-based LIC methods
Requires up to 45% fewer model parameters
Achieves comparable results to VVC intra profile
Abstract
A Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders. Both main and hyper encoders are comprised of a sequence of neural transformation units (NTUs) to analyse and aggregate important information for more compact representation of input image, while the decoders mirror the encoder-side operations to generate pixel-domain image reconstruction from the compressed bitstream. Each NTU is consist of a Swin Transformer Block (STB) and a convolutional layer (Conv) to best embed both long-range and short-range information; In the meantime, a casual attention module (CAM) is devised for adaptive context modeling of latent features to utilize both hyper and autoregressive priors. The TIC rivals with state-of-the-art approaches including deep convolutional neural networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image and Signal Denoising Methods · Advanced Image Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Dropout · Softmax · Stochastic Depth · Residual Connection · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Swin Transformer
