Dataset of Quotation Attribution in German News Articles

Fynn Petersen-Frey; Chris Biemann

arXiv:2404.16764·cs.CL·April 26, 2024

Dataset of Quotation Attribution in German News Articles

Fynn Petersen-Frey, Chris Biemann

PDF

Open Access

TL;DR

This paper introduces a new high-quality, annotated dataset for quotation attribution in German news articles, addressing a key resource gap and enabling improved NLP systems for analyzing human communication.

Contribution

The paper presents a novel, freely available dataset with detailed annotations for quotation attribution in German news articles, including schema design and evaluation of existing systems.

Findings

01

The dataset contains 250,000 tokens across 1000 documents.

02

Existing systems show promising results when applied to the dataset.

03

The dataset facilitates various downstream NLP tasks.

Abstract

Extracting who says what to whom is a crucial part in analyzing human communication in today's abundance of data such as online news articles. Yet, the lack of annotated data for this task in German news articles severely limits the quality and usability of possible systems. To remedy this, we present a new, freely available, creative-commons-licensed dataset for quotation attribution in German news articles based on WIKINEWS. The dataset provides curated, high-quality annotations across 1000 documents (250,000 tokens) in a fine-grained annotation schema enabling various downstream uses for the dataset. The annotations not only specify who said what but also how, in which context, to whom and define the type of quotation. We specify our annotation schema, describe the creation of the dataset and provide a quantitative analysis. Further, we describe suitable evaluation metrics, apply two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods