FRACAS: A FRench Annotated Corpus of Attribution relations in newS

Ange Richard; Laura Alonzo-Canul; Fran\c{c}ois Portet

arXiv:2309.10604·cs.CL·September 20, 2023·1 cites

FRACAS: A FRench Annotated Corpus of Attribution relations in newS

Ange Richard, Laura Alonzo-Canul, Fran\c{c}ois Portet

PDF

Open Access

TL;DR

This paper introduces FRACAS, a manually annotated French news corpus for quotation extraction and attribution, addressing the scarcity of non-English data and providing high-quality annotations for NLP research.

Contribution

It presents a new French annotated corpus with detailed annotation guidelines, high inter-annotator agreement, and analysis of quote types, advancing multilingual quotation extraction research.

Findings

01

High inter-annotator agreement (substantially high)

02

Balanced distribution of quote types (direct, indirect, mixed)

03

A comprehensive dataset for French quotation extraction

Abstract

Quotation extraction is a widely useful task both from a sociological and from a Natural Language Processing perspective. However, very little data is available to study this task in languages other than English. In this paper, we present a manually annotated corpus of 1676 newswire texts in French for quotation extraction and source attribution. We first describe the composition of our corpus and the choices that were made in selecting the data. We then detail the annotation guidelines and annotation process, as well as a few statistics about the final corpus and the obtained balance between quote types (direct, indirect and mixed, which are particularly challenging). We end by detailing our inter-annotator agreement between the 8 annotators who worked on manual labelling, which is substantially high for such a difficult linguistic phenomenon.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques