Taggus: An Automated Pipeline for the Extraction of Characters' Social Networks from Portuguese Fiction Literature
Tiago G Can\'ario, Catarina Duarte, Fl\'avio L. Pinheiro, Jo\~ao L.M. Pereira

TL;DR
This paper introduces Taggus, an NLP pipeline tailored for extracting social networks of characters from Portuguese fiction, significantly outperforming existing tools in accuracy and providing a foundation for further research.
Contribution
The paper presents a novel, language-specific NLP pipeline that improves character and interaction detection in Portuguese literature, addressing limitations of current methods.
Findings
Achieved 94.1% F1-Score in character identification
Achieved 75.9% F1-Score in interaction detection
Outperformed state-of-the-art tools by over 22% in key metrics
Abstract
Automatically identifying characters and their interactions from fiction books is, arguably, a complex task that requires pipelines that leverage multiple Natural Language Processing (NLP) methods, such as Named Entity Recognition (NER) and Part-of-speech (POS) tagging. However, these methods are not optimized for the task that leads to the construction of Social Networks of Characters. Indeed, the currently available methods tend to underperform, especially in less-represented languages, due to a lack of manually annotated data for training. Here, we propose a pipeline, which we call Taggus, to extract social networks from literary fiction works in Portuguese. Our results show that compared to readily available State-of-the-Art tools -- off-the-shelf NER tools and Large Language Models (ChatGPT) -- the resulting pipeline, which uses POS tagging and a combination of heuristics, achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
