Guidelines and a Corpus for Extracting Biographical Events

Marco Antonio Stranisci; Enrico Mensa; Ousmane Diakite; Daniele; Radicioni; Rossana Damiano

arXiv:2206.03547·cs.CL·June 9, 2022

Guidelines and a Corpus for Extracting Biographical Events

Marco Antonio Stranisci, Enrico Mensa, Ousmane Diakite, Daniele, Radicioni, Rossana Damiano

PDF

Open Access

TL;DR

This paper presents guidelines and a new annotated corpus for extracting biographical events, focusing on underrepresented groups, and demonstrates how existing resources can be leveraged for this task.

Contribution

It introduces interoperable annotation guidelines and a corpus for biographical event extraction, expanding resources for underrepresented populations.

Findings

01

Achieved high inter-annotator agreement of 0.825

02

Mapped the corpus onto OntoNotes to expand data

03

Demonstrated the utility of existing resources for biographical extraction

Abstract

Despite biographies are widely spread within the Semantic Web, resources and approaches to automatically extract biographical events are limited. Such limitation reduces the amount of structured, machine-readable biographical information, especially about people belonging to underrepresented groups. Our work challenges this limitation by providing a set of guidelines for the semantic annotation of life events. The guidelines are designed to be interoperable with existing ISO-standards for semantic annotation: ISO-TimeML (ISO-24617-1), and SemAF (ISO-24617-4). Guidelines were tested through an annotation task of Wikipedia biographies of underrepresented writers, namely authors born in non-Western countries, migrants, or belonging to ethnic minorities. 1,000 sentences were annotated by 4 annotators with an average Inter-Annotator Agreement of 0.825. The resulting corpus was mapped on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Wikis in Education and Collaboration