Multi-Lingual DALL-E Storytime
Noga Mudrik, Adam S. Charles

TL;DR
This paper introduces an automatic storytelling framework that enhances DALL-E's ability to generate coherent, multi-frame visual stories from non-English texts, addressing language bias and storytelling limitations.
Contribution
The authors develop a framework that enables DALL-E to create coherent visual stories from non-English sources, overcoming language and sequential storytelling constraints.
Findings
Effective visualization of non-English stories and songs.
Ability to generate coherent, multi-frame narratives.
User constraints can be incorporated for customized storytelling.
Abstract
While recent advancements in artificial intelligence (AI) language models demonstrate cutting-edge performance when working with English texts, equivalent models do not exist in other languages or do not reach the same performance level. This undesired effect of AI advancements increases the gap between access to new technology from different populations across the world. This unsought bias mainly discriminates against individuals whose English skills are less developed, e.g., non-English speakers children. Following significant advancements in AI research in recent years, OpenAI has recently presented DALL-E: a powerful tool for creating images based on English text prompts. While DALL-E is a promising tool for many applications, its decreased performance when given input in a different language, limits its audience and deepens the gap between populations. An additional limitation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
