Convolutional Transformer-Based Image Compression
Bouzid Arezki, Fangchen Feng, Anissa Mokraoui

TL;DR
This paper introduces a convolutional transformer architecture for image compression that outperforms CNN-based methods in bit-rate and distortion trade-offs, while maintaining lower computational complexity compared to other transformer-based approaches.
Contribution
It proposes a novel transformer architecture that integrates convolutional operations within attention to improve image compression efficiency.
Findings
Outperforms CNN-based architectures in bit-rate/distortion trade-off
Achieves comparable results to transformer-based methods with lower complexity
Effectively captures local dependencies without positional encoding
Abstract
In this paper, we present a novel transformer-based architecture for end-to-end image compression. Our architecture incorporates blocks that effectively capture local dependencies between tokens, eliminating the need for positional encoding by integrating convolutional operations within the multi-head attention mechanism. We demonstrate through experiments that our proposed framework surpasses state-of-the-art CNN-based architectures in terms of the trade-off between bit-rate and distortion and achieves comparable results to transformer-based methods while maintaining lower computational complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention
