Improved Masked Image Generation with Knowledge-Augmented Token Representations

Guotao Liang; Baoquan Zhang; Zhiyuan Wen; Zihao Han; Yunming Ye

arXiv:2511.12032·cs.CV·November 18, 2025

Improved Masked Image Generation with Knowledge-Augmented Token Representations

Guotao Liang, Baoquan Zhang, Zhiyuan Wen, Zihao Han, Yunming Ye

PDF

Open Access 1 Video

TL;DR

This paper introduces KA-MIG, a framework that incorporates explicit knowledge graphs of token dependencies to enhance masked image generation, resulting in higher quality image synthesis.

Contribution

It proposes a novel knowledge-augmented approach using three types of token knowledge graphs to improve semantic dependency learning in masked image generation.

Findings

01

Improved image generation quality on ImageNet dataset.

02

Effective integration of knowledge graphs enhances semantic understanding.

03

Outperforms existing MIG methods in class-conditional image synthesis.

Abstract

Masked image generation (MIG) has demonstrated remarkable efficiency and high-fidelity images by enabling parallel token prediction. Existing methods typically rely solely on the model itself to learn semantic dependencies among visual token sequences. However, directly learning such semantic dependencies from data is challenging because the individual tokens lack clear semantic meanings, and these sequences are usually long. To address this limitation, we propose a novel Knowledge-Augmented Masked Image Generation framework, named KA-MIG, which introduces explicit knowledge of token-level semantic dependencies (\emph{i.e.}, extracted from the training data) as priors to learn richer representations for improving performance. In particular, we explore and identify three types of advantageous token knowledge graphs, including two positive and one negative graphs (\emph{i.e.}, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improved Masked Image Generation with Knowledge-Augmented Token Representations· underline

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Enhancement Techniques