NewsStories: Illustrating articles with visual summaries

Reuben Tan; Bryan A. Plummer; Kate Saenko; JP Lewis; Avneesh Sud,; Thomas Leung

arXiv:2207.13061·cs.CV·August 16, 2022

NewsStories: Illustrating articles with visual summaries

Reuben Tan, Bryan A. Plummer, Kate Saenko, JP Lewis, Avneesh Sud,, Thomas Leung

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new large-scale dataset and method for learning visual-language representations that handle long, multi-image news articles with loose image-text correspondence, improving zero-shot image retrieval.

Contribution

It presents a novel setting for visual-language learning with long narratives and multiple images, along with a large dataset and a baseline method that outperforms existing approaches.

Findings

01

State-of-the-art methods struggle with long, multi-image narratives.

02

A new baseline improves zero-shot image retrieval by 10%.

03

The dataset contains over 31 million articles and 22 million images.

Abstract

Recent self-supervised approaches have used large-scale image-text datasets to learn powerful representations that transfer to many tasks without finetuning. These methods often assume that there is one-to-one correspondence between its images and their (short) captions. However, many tasks require reasoning about multiple images and long text narratives, such as describing news articles with visual summaries. Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images. In addition, unlike prior work which assumed captions have a literal relation to the image, we assume images only contain loose illustrative correspondence with the text. To explore this problem, we introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos. We show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

newsstoriesdata/newsstories.github.io
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization