Beyond Hallucinations: A Multimodal-Guided Task-Aware Generative Image Compression for Ultra-Low Bitrate
Kaile Wang, Lijun He, Haisheng Fu, Haixia Bi, Fan Li

TL;DR
This paper introduces MTGC, a multimodal-guided, task-aware generative image compression framework that improves semantic consistency and perceptual quality at ultra-low bitrates by integrating text, compressed images, and semantic pseudo-words.
Contribution
The paper proposes a novel multimodal guidance strategy and a task-aware semantic compression module to enhance semantic fidelity in ultra-low bitrate image compression.
Findings
Significant reduction in semantic deviation (DISTS drops by 10.59%)
Improved perceptual quality and pixel fidelity at ultra-low bitrate
Effective integration of text, compressed images, and semantic pseudo-words
Abstract
Generative image compression has recently shown impressive perceptual quality, but often suffers from semantic deviations caused by generative hallucinations at ultra-low bitrate (bpp < 0.05), limiting its reliable deployment in bandwidth-constrained 6G semantic communication scenarios. In this work, we reassess the positioning and role of of multimodal guidance, and propose a Multimodal-Guided Task-Aware Generative Image Compression (MTGC) framework. Specifically, MTGC integrates three guidance modalities to enhance semantic consistency: a concise but robust text caption for global semantics, a highly compressed image (HCI) retaining low-level visual information, and Semantic Pseudo-Words (SPWs) for fine-grained task-relevant semantics. The SPWs are generated by our designed Task-Aware Semantic Compression Module (TASCM), which operates in a task-oriented manner to drive the multi-head…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis
