Bridging the Intent Gap: Knowledge-Enhanced Visual Generation
Yi Cheng, Ziwei Xu, Dongyun Lin, Harry Cheng, Yongkang Wong, Ying Sun,, Joo Hwee Lim, Mohan Kankanhalli

TL;DR
This paper introduces a knowledge-enhanced iterative framework for visual content generation that aims to better align generated images with user intentions by leveraging diverse knowledge sources and feedback mechanisms.
Contribution
It proposes a novel framework combining knowledge sources and feedback modules to improve the accuracy and intention alignment of visual generative models.
Findings
Preliminary results show improved alignment with user intentions.
Knowledge sources enhance the quality and relevance of generated content.
Iterative refinement reduces discrepancies between input prompts and outputs.
Abstract
For visual content generation, discrepancies between user intentions and the generated content have been a longstanding problem. This discrepancy arises from two main factors. First, user intentions are inherently complex, with subtle details not fully captured by input prompts. The absence of such details makes it challenging for generative models to accurately reflect the intended meaning, leading to a mismatch between the desired and generated output. Second, generative models trained on visual-label pairs lack the comprehensive knowledge to accurately represent all aspects of the input data in their generated outputs. To address these challenges, we propose a knowledge-enhanced iterative refinement framework for visual content generation. We begin by analyzing and identifying the key challenges faced by existing generative models. Then, we introduce various knowledge sources,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Games and Gamification
