Data Augmentation for Robust Character Detection in Fantasy Novels

Arthur Amalvy; Vincent Labatut; Richard Dufour

arXiv:2302.04555·cs.CL·February 10, 2023

Data Augmentation for Robust Character Detection in Fantasy Novels

Arthur Amalvy, Vincent Labatut, Richard Dufour

PDF

Open Access 1 Repo

TL;DR

This paper introduces a data augmentation method to improve character detection in fantasy novels, increasing recall but requiring context adjustments to manage precision loss.

Contribution

It presents a simple data augmentation approach that enhances recall in character NER for novels, addressing ambiguity with local context strategies.

Findings

01

Increased recall of character detection with data augmentation.

02

Precision decreases on ambiguous entities, but can be mitigated.

03

Local context improves disambiguation in character recognition.

Abstract

Named Entity Recognition (NER) is a low-level task often used as a foundation for solving higher level NLP problems. In the context of character detection in novels, NER false negatives can be an issue as they possibly imply missing certain characters or relationships completely. In this article, we demonstrate that applying a straightforward data augmentation technique allows training a model achieving higher recall, at the cost of a certain amount of precision regarding ambiguous entities. We show that this decrease in precision can be mitigated by giving the model more local context, which resolves some of the ambiguities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

compnet/ddaugner
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling