Multi-Narrative Semantic Overlap Task: Evaluation and Benchmark

Naman Bansal; Mousumi Akter; Shubhra Kanti Karmaker Santu

arXiv:2201.05294·cs.CL·January 17, 2022

Multi-Narrative Semantic Overlap Task: Evaluation and Benchmark

Naman Bansal, Mousumi Akter, Shubhra Kanti Karmaker Santu

PDF

Open Access

TL;DR

This paper introduces the Multi-Narrative Semantic Overlap task, creates a benchmark dataset, and proposes a new evaluation metric, SEM-F1, which better aligns with human judgment than existing metrics.

Contribution

It defines a new NLP task, constructs a benchmark dataset with human annotations, and develops SEM-F1, a novel evaluation metric for semantic overlap.

Findings

01

ROUGE is unsuitable for MNSO evaluation.

02

SEM-F1 correlates better with human judgment.

03

Benchmark dataset with 2,925 narrative pairs and 411 ground-truth overlaps.

Abstract

In this paper, we introduce an important yet relatively unexplored NLP task called Multi-Narrative Semantic Overlap (MNSO), which entails generating a Semantic Overlap of multiple alternate narratives. As no benchmark dataset is readily available for this task, we created one by crawling 2,925 narrative pairs from the web and then, went through the tedious process of manually creating 411 different ground-truth semantic overlaps by engaging human annotators. As a way to evaluate this novel task, we first conducted a systematic study by borrowing the popular ROUGE metric from text-summarization literature and discovered that ROUGE is not suitable for our task. Subsequently, we conducted further human annotations/validations to create 200 document-level and 1,518 sentence-level ground-truth labels which helped us formulate a new precision-recall style evaluation metric, called SEM-F1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques