Customized Image Narrative Generation via Interactive Visual Question Generation and Answering
Andrew Shin, Yoshitaka Ushiku, Tatsuya Harada

TL;DR
This paper introduces a novel interactive method for generating customized image narratives by engaging users with visual questions and answers, capturing diverse perspectives and interests.
Contribution
It proposes a new interactive framework for image description that learns user interests over multiple stages, enabling personalized and diverse image narratives.
Findings
Generated descriptions cover a wider range of topics.
Model adapts to individual user interests.
Produces more diverse narratives than traditional methods.
Abstract
Image description task has been invariably examined in a static manner with qualitative presumptions held to be universally applicable, regardless of the scope or target of the description. In practice, however, different viewers may pay attention to different aspects of the image, and yield different descriptions or interpretations under various contexts. Such diversity in perspectives is difficult to derive with conventional image description techniques. In this paper, we propose a customized image narrative generation task, in which the users are interactively engaged in the generation process by providing answers to the questions. We further attempt to learn the user's interest via repeating such interactive stages, and to automatically reflect the interest in descriptions for new images. Experimental results demonstrate that our model can generate a variety of descriptions from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
