Improved Masked Image Generation with Knowledge-Augmented Token Representations
Guotao Liang, Baoquan Zhang, Zhiyuan Wen, Zihao Han, Yunming Ye

TL;DR
This paper introduces KA-MIG, a framework that incorporates explicit knowledge graphs of token dependencies to enhance masked image generation, resulting in higher quality image synthesis.
Contribution
It proposes a novel knowledge-augmented approach using three types of token knowledge graphs to improve semantic dependency learning in masked image generation.
Findings
Improved image generation quality on ImageNet dataset.
Effective integration of knowledge graphs enhances semantic understanding.
Outperforms existing MIG methods in class-conditional image synthesis.
Abstract
Masked image generation (MIG) has demonstrated remarkable efficiency and high-fidelity images by enabling parallel token prediction. Existing methods typically rely solely on the model itself to learn semantic dependencies among visual token sequences. However, directly learning such semantic dependencies from data is challenging because the individual tokens lack clear semantic meanings, and these sequences are usually long. To address this limitation, we propose a novel Knowledge-Augmented Masked Image Generation framework, named KA-MIG, which introduces explicit knowledge of token-level semantic dependencies (\emph{i.e.}, extracted from the training data) as priors to learn richer representations for improving performance. In particular, we explore and identify three types of advantageous token knowledge graphs, including two positive and one negative graphs (\emph{i.e.}, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Enhancement Techniques
