LTOS: Layout-controllable Text-Object Synthesis via Adaptive Cross-attention Fusions
Xiaoran Zhao, Tianhao Wu, Yu Lai, Zhiliang Tian, Zhen Huang, Yahui, Liu, Zejiang He, Dongsheng Li

TL;DR
This paper introduces LTOS, a unified framework for layout-controllable text-object synthesis that combines text rendering and object generation, utilizing a new dataset and adaptive fusion modules to produce high-quality, controllable images.
Contribution
The paper proposes a novel LTOS task, constructs a dedicated dataset, and develops an adaptive fusion framework with a self-attention mechanism for improved text and object image synthesis.
Findings
Outperforms state-of-the-art in LTOS, text rendering, and layout-to-image tasks.
Generates images with clear, legible text and plausible objects.
Effectively integrates text and object information through adaptive fusion.
Abstract
Controllable text-to-image generation synthesizes visual text and objects in images with certain conditions, which are frequently applied to emoji and poster generation. Visual text rendering and layout-to-image generation tasks have been popular in controllable text-to-image generation. However, each of these tasks typically focuses on single modality generation or rendering, leaving yet-to-be-bridged gaps between the approaches correspondingly designed for each of the tasks. In this paper, we combine text rendering and layout-to-image generation tasks into a single task: layout-controllable text-object synthesis (LTOS) task, aiming at synthesizing images with object and visual text based on predefined object layout and text contents. As compliant datasets are not readily available for our LTOS task, we construct a layout-aware text-object synthesis dataset, containing elaborate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
