Album Storytelling with Iterative Story-aware Captioning and Large Language Models
Munan Ning, Yujia Xie, Dongdong Chen, Zeyin Song, Lu Yuan, Yonghong, Tian, Qixiang Ye, Li Yuan

TL;DR
This paper introduces an iterative, story-aware captioning approach combined with large language models to generate coherent, vivid stories from photo albums, addressing hallucination issues and improving storytelling quality.
Contribution
It proposes a novel iterative pipeline that refines captions and stories using story context, enhancing accuracy and coherence in album storytelling with LLMs.
Findings
Generated stories with fewer factual errors.
Improved coherence and vividness in stories.
Effective on a new dataset of vlog image collections.
Abstract
This work studies how to transform an album to vivid and coherent stories, a task we refer to as "album storytelling". While this task can help preserve memories and facilitate experience sharing, it remains an underexplored area in current literature. With recent advances in Large Language Models (LLMs), it is now possible to generate lengthy, coherent text, opening up the opportunity to develop an AI assistant for album storytelling. One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story. However, we find this often results in stories containing hallucinated information that contradicts the images, as each generated caption ("story-agnostic") is not always about the description related to the whole story or miss some necessary information. To address these limitations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Topic Modeling
