Semantics-Guided Generative Image Compression
Cheng-Lin Wu, Hyomin Choi, Ivan V. Baji\'c

TL;DR
This paper enhances multimodal image semantic compression by introducing semantic segmentation guidance and content-adaptive diffusion, significantly improving image quality and reducing encoding/decoding complexity at low bit rates.
Contribution
It proposes novel semantic segmentation guidance and content-adaptive diffusion components that improve image quality and efficiency in multimodal image semantic compression.
Findings
Improved PSNR and perceptual metrics over baseline MISC
Reduced encoding and decoding time by over 36%
Outperforms mainstream codecs in perceptual quality
Abstract
Advancements in text-to-image generative AI with large multimodal models are spreading into the field of image compression, creating high-quality representation of images at extremely low bit rates. This work introduces novel components to the existing multimodal image semantic compression (MISC) approach, enhancing the quality of the generated images in terms of PSNR and perceptual metrics. The new components include semantic segmentation guidance for the generative decoder, as well as content-adaptive diffusion, which controls the number of diffusion steps based on image characteristics. The results show that our newly introduced methods significantly improve the baseline MISC model while also decreasing the complexity. As a result, both the encoding and decoding time are reduced by more than 36%. Moreover, the proposed compression framework outperforms mainstream codecs in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Data Compression Techniques
MethodsDiffusion
