Data Augmentation for Robust Character Detection in Fantasy Novels
Arthur Amalvy, Vincent Labatut, Richard Dufour

TL;DR
This paper introduces a data augmentation method to improve character detection in fantasy novels, increasing recall but requiring context adjustments to manage precision loss.
Contribution
It presents a simple data augmentation approach that enhances recall in character NER for novels, addressing ambiguity with local context strategies.
Findings
Increased recall of character detection with data augmentation.
Precision decreases on ambiguous entities, but can be mitigated.
Local context improves disambiguation in character recognition.
Abstract
Named Entity Recognition (NER) is a low-level task often used as a foundation for solving higher level NLP problems. In the context of character detection in novels, NER false negatives can be an issue as they possibly imply missing certain characters or relationships completely. In this article, we demonstrate that applying a straightforward data augmentation technique allows training a model achieving higher recall, at the cost of a certain amount of precision regarding ambiguous entities. We show that this decrease in precision can be mitigated by giving the model more local context, which resolves some of the ambiguities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling
