Advancing Aesthetic Image Generation via Composition Transfer
Kai Zou, Zhiwei Zhao, Bin Liu, Nenghai Yu

TL;DR
This paper introduces Composer, a novel framework for explicit and flexible aesthetic image composition transfer and control, leveraging aesthetic theory, large vision-language models, and a new dataset.
Contribution
It presents Composer, a semantic-agnostic composition modeling framework that enables explicit transfer, theme-driven retrieval, and implicit planning for aesthetic image generation.
Findings
Composer improves aesthetic quality in text-to-image generation.
It enables personalized and flexible composition control and transfer.
Experimental results show significant enhancement over existing methods.
Abstract
Composition is a cornerstone of visual aesthetics, influencing the appeal of an image. While its principles operate independently of specific content, in practice, composition is often coupled with semantics. As a result, existing methods often enhance composition either through implicit learning or by semantics-based layout control, rather than explicitly modeling composition itself. To address this gap, we introduce Composer, a framework rooted in aesthetic theory, designed to model composition in a semantic-agnostic manner. First, it supports composition transfer by extracting key composition-aware representations from a reference image and leveraging a tailored conditional guidance module to control composition based on pre-trained diffusion models. Second, when users specify only text themes without a composition reference, Composer supports theme-driven composition retrieval by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
