PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement

Sofia Jamil; Bollampalli Areen Reddy; Raghvendra Kumar; Sriparna Saha; Koustava Goswami; K.J. Joseph

arXiv:2507.13708·cs.CV·July 24, 2025

PoemTale Diffusion: Minimising Information Loss in Poem to Image Generation with Multi-Stage Prompt Refinement

Sofia Jamil, Bollampalli Areen Reddy, Raghvendra Kumar, Sriparna Saha, Koustava Goswami, K.J. Joseph

PDF

TL;DR

PoemTale Diffusion introduces a multi-stage prompt refinement method to improve poetic text-to-image generation, minimizing information loss and capturing complex poetic meanings more effectively.

Contribution

The paper presents a training-free approach with multi-stage prompt refinement and a novel self-attention modification to enhance poetic image generation, along with a new poetry dataset.

Findings

01

Enhanced interpretability of poetic texts in image generation

02

Generation of more consistent and meaningful images from poems

03

Validated improvements through human and quantitative evaluations

Abstract

Recent advancements in text-to-image diffusion models have achieved remarkable success in generating realistic and diverse visual content. A critical factor in this process is the model's ability to accurately interpret textual prompts. However, these models often struggle with creative expressions, particularly those involving complex, abstract, or highly descriptive language. In this work, we introduce a novel training-free approach tailored to improve image generation for a unique form of creative language: poetic verse, which frequently features layered, abstract, and dual meanings. Our proposed PoemTale Diffusion approach aims to minimise the information that is lost during poetic text-to-image conversion by integrating a multi stage prompt refinement loop into Language Models to enhance the interpretability of poetic texts. To support this, we adapt existing state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.