BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
Jing Su, Qingyun Dai, Frank Guerin, Mian Zhou

TL;DR
This paper introduces a hierarchical visual storytelling model combining BERT and LSTMs to generate coherent stories from image sequences, improving upon previous methods by modeling sentence and word dependencies.
Contribution
The novel framework integrates BERT with hierarchical LSTMs to better capture sentence-level and word-level semantics in visual storytelling.
Findings
Outperforms baseline models on BLEU and CIDEr metrics.
Demonstrates improved coherence through hierarchical modeling.
Validated by human evaluation.
Abstract
Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-based BERT to obtain embeddings for sentences and words. We then employ a hierarchical LSTM network: the bottom LSTM receives as input the sentence vector representation from BERT, to learn the dependencies between the sentences corresponding to images, and the top LSTM is responsible for generating the corresponding word vector representations, taking input from the bottom LSTM.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
MethodsLinear Layer · Tanh Activation · WordPiece · Residual Connection · Sigmoid Activation · Dense Connections · Long Short-Term Memory · Attention Is All You Need · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia?
