TL;DR
ICECAP is a novel image captioning model that leverages news articles at multiple levels of detail to generate more informative and entity-aware captions, outperforming existing methods.
Contribution
The paper introduces ICECAP, a progressive concentration approach that refines relevant news information from sentence to word level for improved image captioning.
Findings
Outperforms state-of-the-art methods on BreakingNews and GoodNews datasets
Effectively concentrates on relevant textual information at multiple levels
Demonstrates significant improvements in caption informativeness and accuracy
Abstract
Most current image captioning systems focus on describing general image content, and lack background knowledge to deeply understand the image, such as exact named entities or concrete events. In this work, we focus on the entity-aware news image captioning task which aims to generate informative captions by leveraging the associated news articles to provide background knowledge about the target image. However, due to the length of news articles, previous works only employ news articles at the coarse article or sentence level, which are not fine-grained enough to refine relevant events and choose named entities accurately. To overcome these limitations, we propose an Information Concentrated Entity-aware news image CAPtioning (ICECAP) model, which progressively concentrates on relevant textual information within the corresponding news article from the sentence level to the word level.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
