DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression

Junqi Shi; Ming Lu; Xingchen Li; Anle Ke; Ruiqi Zhang; Zhan Ma

arXiv:2603.13162·eess.IV·March 16, 2026

DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression

Junqi Shi, Ming Lu, Xingchen Li, Anle Ke, Ruiqi Zhang, Zhan Ma

PDF

Open Access 1 Models

TL;DR

DiT-IC introduces an aligned diffusion transformer that enables efficient, high-quality image compression in deep latent spaces, significantly reducing decoding time and memory usage compared to existing diffusion codecs.

Contribution

It proposes a novel diffusion transformer architecture with alignment mechanisms for effective diffusion in deep latent spaces, enabling fast, high-quality image reconstruction without text prompts.

Findings

01

Achieves state-of-the-art perceptual quality in image compression.

02

Offers up to 30x faster decoding than existing diffusion codecs.

03

Can reconstruct 2048x2048 images on a 16 GB GPU.

Abstract

Diffusion-based image compression has recently shown outstanding perceptual fidelity, yet its practicality is hindered by prohibitive sampling overhead and high memory usage. Most existing diffusion codecs employ U-Net architectures, where hierarchical downsampling forces diffusion to operate in shallow latent spaces (typically with only 8x spatial downscaling), resulting in excessive computation. In contrast, conventional VAE-based codecs work in much deeper latent domains (16x - 64x downscaled), motivating a key question: Can diffusion operate effectively in such compact latent spaces without compromising reconstruction quality? To address this, we introduce DiT-IC, an Aligned Diffusion Transformer for Image Compression, which replaces the U-Net with a Diffusion Transformer capable of performing diffusion in latent space entirely at 32x downscaled resolution. DiT-IC adapts a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
JunqiShi/DiT-IC
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Image and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis