Towards End-to-End Image Compression and Analysis with Transformers
Yuanchao Bai, Xu Yang, Xianming Liu, Junjun Jiang, Yaowei Wang,, Xiangyang Ji, Wen Gao

TL;DR
This paper introduces an end-to-end image compression and analysis model using Transformers, which improves compression and classification by integrating compressed features with Transformer-based long-term information.
Contribution
It redesigns the Vision Transformer to operate directly on compressed features and introduces a feature aggregation module for enhanced compression and reconstruction.
Findings
Effective in both image compression and classification tasks
Improves compression performance by leveraging Transformer long-term information
Achieves competitive results with a novel two-step training strategy
Abstract
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer. Specifically, we first replace the patchify stem (i.e., image splitting and embedding) of the ViT model with a lightweight image encoder modelled by a convolutional neural network. The compressed features generated by the image encoder are injected convolutional inductive bias and are fed to the Transformer for image classification bypassing image reconstruction. Meanwhile, we propose a feature aggregation module to fuse the compressed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Image Processing Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Layer Normalization · Absolute Position Encodings · Dropout · Label Smoothing
