ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer
Kohei Uehara, Yusuke Mori, Yusuke Mukuta, Tatsuya Harada

TL;DR
ViNTER introduces an emotion-arc-aware transformer model for image narrative generation, effectively capturing emotional trajectories to produce more subjective and emotionally coherent stories from images.
Contribution
The paper proposes ViNTER, a novel model that incorporates emotion arcs into image narrative generation, enhancing emotional expressiveness and storytelling quality.
Findings
Effective in capturing emotional trajectories
Improves subjective storytelling quality
Validated by automatic and manual evaluations
Abstract
Image narrative generation is a task to create a story from an image with a subjective viewpoint. Given the importance of the subjective feelings of writers, readers, and characters in storytelling, an image narrative generation method should consider human emotion. In this study, we propose a novel method of image narrative generation called ViNTER (Visual Narrative Transformer with Emotion arc Representation), which takes "emotion arc" as input to capture a sequence of emotional changes. Since emotion arcs represent the trajectory of emotional change, it is expected that we can include detailed information about the emotional changes in the story to the model. We present experimental results of both automatic and manual evaluations on the Image Narrative dataset and demonstrate the effectiveness of the proposed approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Softmax · Layer Normalization · Multi-Head Attention · Byte Pair Encoding · Dense Connections · Absolute Position Encodings · Dropout · Label Smoothing
