Generating Coherent Sequences of Visual Illustrations for Real-World Manual Tasks
Jo\~ao Bordalo, Vasco Ramos, Rodrigo Val\'erio, Diogo Gl\'oria-Silva,, Yonatan Bitton, Michal Yarom, Idan Szpektor, Joao Magalhaes

TL;DR
This paper presents a novel method combining language and vision models to generate coherent, visually consistent image sequences for multi-step instructions, improving over existing approaches in semantic and visual coherence.
Contribution
It introduces a new approach integrating LLMs and LDMs with a copy mechanism to ensure semantic and visual consistency in generated image sequences for manual tasks.
Findings
Human preference for the proposed method is 46.6% versus 26.6% for the second best.
The approach maintains semantic coherence across instruction steps.
The method ensures visual consistency in generated image sequences.
Abstract
Multistep instructions, such as recipes and how-to guides, greatly benefit from visual aids, such as a series of images that accompany the instruction steps. While Large Language Models (LLMs) have become adept at generating coherent textual steps, Large Vision/Language Models (LVLMs) are less capable of generating accompanying image sequences. The most challenging aspect is that each generated image needs to adhere to the relevant textual step instruction, as well as be visually consistent with earlier images in the sequence. To address this problem, we propose an approach for generating consistent image sequences, which integrates a Latent Diffusion Model (LDM) with an LLM to transform the sequence into a caption to maintain the semantic coherence of the sequence. In addition, to maintain the visual coherence of the image sequence, we introduce a copy mechanism to initialise reverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHuman Motion and Animation · Educational Games and Gamification · Data Visualization and Analytics
MethodsLatent Diffusion Model · Diffusion
