CEIDM: A Controlled Entity and Interaction Diffusion Model for Enhanced Text-to-Image Generation

Mingyue Yang; Dianxi Shi; Jialu Zhou; Xinyu Wei; Leqian Li; Shaowu Yang; Chunping Qiu

arXiv:2508.17760·cs.CV·August 26, 2025

CEIDM: A Controlled Entity and Interaction Diffusion Model for Enhanced Text-to-Image Generation

Mingyue Yang, Dianxi Shi, Jialu Zhou, Xinyu Wei, Leqian Li, Shaowu Yang, Chunping Qiu

PDF

TL;DR

CEIDM introduces a diffusion-based text-to-image generation method with dual controls for entities and their interactions, leveraging LLMs and advanced clustering to produce more realistic and semantically accurate images.

Contribution

The paper presents a novel diffusion model with dual control mechanisms for entities and interactions, utilizing LLM-based relationship mining and interactive action clustering.

Findings

01

Outperforms existing methods in entity control accuracy.

02

Produces images with more realistic interactive relationships.

03

Enhances semantic understanding of actions in generated images.

Abstract

In Text-to-Image (T2I) generation, the complexity of entities and their intricate interactions pose a significant challenge for T2I method based on diffusion model: how to effectively control entity and their interactions to produce high-quality images. To address this, we propose CEIDM, a image generation method based on diffusion model with dual controls for entity and interaction. First, we propose an entity interactive relationships mining approach based on Large Language Models (LLMs), extracting reasonable and rich implicit interactive relationships through chain of thought to guide diffusion models to generate high-quality images that are closer to realistic logic and have more reasonable interactive relationships. Furthermore, We propose an interactive action clustering and offset method to cluster and offset the interactive action features contained in each text prompts. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.