Picking Apart Story Salads

Su Wang; Eric Holgate; Greg Durrett; Katrin Erk

arXiv:1810.13391·cs.CL·November 1, 2018·1 cites

Picking Apart Story Salads

Su Wang, Eric Holgate, Greg Durrett, Katrin Erk

PDF

Open Access

TL;DR

This paper introduces 'Story Salads', a new dataset of mixed documents designed to challenge neural models in extracting coherent narratives from confusing, multi-source information, highlighting the need for context-aware clustering methods.

Contribution

The paper presents a novel, large-scale dataset called Story Salads for testing narrative extraction and proposes that effective clustering requires global context understanding.

Findings

01

Simple bag-of-words clustering is ineffective on Story Salads.

02

Global context and coherence are essential for accurate clustering.

03

Story Salads reveal limitations of current neural models in narrative assembly.

Abstract

During natural disasters and conflicts, information about what happened is often confusing, messy, and distributed across many sources. We would like to be able to automatically identify relevant information and assemble it into coherent narratives of what happened. To make this task accessible to neural models, we introduce Story Salads, mixtures of multiple documents that can be generated at scale. By exploiting the Wikipedia hierarchy, we can generate salads that exhibit challenging inference problems. Story salads give rise to a novel, challenging clustering task, where the objective is to group sentences from the same narratives. We demonstrate that simple bag-of-words similarity clustering falls short on this task and that it is necessary to take into account global context and coherence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications