Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

Yi Cheng; Ziwei Xu; Dongyun Lin; Harry Cheng; Yongkang Wong; Ying Sun,; Joo Hwee Lim; Mohan Kankanhalli

arXiv:2405.12538·cs.CV·May 22, 2024

Bridging the Intent Gap: Knowledge-Enhanced Visual Generation

Yi Cheng, Ziwei Xu, Dongyun Lin, Harry Cheng, Yongkang Wong, Ying Sun,, Joo Hwee Lim, Mohan Kankanhalli

PDF

Open Access

TL;DR

This paper introduces a knowledge-enhanced iterative framework for visual content generation that aims to better align generated images with user intentions by leveraging diverse knowledge sources and feedback mechanisms.

Contribution

It proposes a novel framework combining knowledge sources and feedback modules to improve the accuracy and intention alignment of visual generative models.

Findings

01

Preliminary results show improved alignment with user intentions.

02

Knowledge sources enhance the quality and relevance of generated content.

03

Iterative refinement reduces discrepancies between input prompts and outputs.

Abstract

For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leading to a mismatch between the desired and generated output. Second, generative models trained on visual-label pairs lack the comprehensive knowledge to accurately represent all aspects of the input data in their generated outputs. To address these challenges, we propose a knowledge-enhanced iterative refinement framework for visual content generation. We begin by analyzing and identifying the key challenges faced by existing generative models. Then, we introduce various knowledge sources,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Games and Gamification