Convolutional Transformer-Based Image Compression

Bouzid Arezki; Fangchen Feng; Anissa Mokraoui

arXiv:2409.04118·eess.IV·September 9, 2024·SPA

Convolutional Transformer-Based Image Compression

Bouzid Arezki, Fangchen Feng, Anissa Mokraoui

PDF

TL;DR

This paper introduces a convolutional transformer architecture for image compression that outperforms CNN-based methods in bit-rate and distortion trade-offs, while maintaining lower computational complexity compared to other transformer-based approaches.

Contribution

It proposes a novel transformer architecture that integrates convolutional operations within attention to improve image compression efficiency.

Findings

01

Outperforms CNN-based architectures in bit-rate/distortion trade-off

02

Achieves comparable results to transformer-based methods with lower complexity

03

Effectively captures local dependencies without positional encoding

Abstract

In this paper, we present a novel transformer-based architecture for end-to-end image compression. Our architecture incorporates blocks that effectively capture local dependencies between tokens, eliminating the need for positional encoding by integrating convolutional operations within the multi-head attention mechanism. We demonstrate through experiments that our proposed framework surpasses state-of-the-art CNN-based architectures in terms of the trade-off between bit-rate and distortion and achieves comparable results to transformer-based methods while maintaining lower computational complexity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention