DiT-IC: Aligned Diffusion Transformer for Efficient Image Compression
Junqi Shi, Ming Lu, Xingchen Li, Anle Ke, Ruiqi Zhang, Zhan Ma

TL;DR
DiT-IC introduces an aligned diffusion transformer that enables efficient, high-quality image compression in deep latent spaces, significantly reducing decoding time and memory usage compared to existing diffusion codecs.
Contribution
It proposes a novel diffusion transformer architecture with alignment mechanisms for effective diffusion in deep latent spaces, enabling fast, high-quality image reconstruction without text prompts.
Findings
Achieves state-of-the-art perceptual quality in image compression.
Offers up to 30x faster decoding than existing diffusion codecs.
Can reconstruct 2048x2048 images on a 16 GB GPU.
Abstract
Diffusion-based image compression has recently shown outstanding perceptual fidelity, yet its practicality is hindered by prohibitive sampling overhead and high memory usage. Most existing diffusion codecs employ U-Net architectures, where hierarchical downsampling forces diffusion to operate in shallow latent spaces (typically with only 8x spatial downscaling), resulting in excessive computation. In contrast, conventional VAE-based codecs work in much deeper latent domains (16x - 64x downscaled), motivating a key question: Can diffusion operate effectively in such compact latent spaces without compromising reconstruction quality? To address this, we introduce DiT-IC, an Aligned Diffusion Transformer for Image Compression, which replaces the U-Net with a Diffusion Transformer capable of performing diffusion in latent space entirely at 32x downscaled resolution. DiT-IC adapts a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis
