Taggus: An Automated Pipeline for the Extraction of Characters' Social Networks from Portuguese Fiction Literature

Tiago G Can\'ario; Catarina Duarte; Fl\'avio L. Pinheiro; Jo\~ao L.M. Pereira

arXiv:2508.03358·cs.CL·August 6, 2025

Taggus: An Automated Pipeline for the Extraction of Characters' Social Networks from Portuguese Fiction Literature

Tiago G Can\'ario, Catarina Duarte, Fl\'avio L. Pinheiro, Jo\~ao L.M. Pereira

PDF

TL;DR

This paper introduces Taggus, an NLP pipeline tailored for extracting social networks of characters from Portuguese fiction, significantly outperforming existing tools in accuracy and providing a foundation for further research.

Contribution

The paper presents a novel, language-specific NLP pipeline that improves character and interaction detection in Portuguese literature, addressing limitations of current methods.

Findings

01

Achieved 94.1% F1-Score in character identification

02

Achieved 75.9% F1-Score in interaction detection

03

Outperformed state-of-the-art tools by over 22% in key metrics

Abstract

Automatically identifying characters and their interactions from fiction books is, arguably, a complex task that requires pipelines that leverage multiple Natural Language Processing (NLP) methods, such as Named Entity Recognition (NER) and Part-of-speech (POS) tagging. However, these methods are not optimized for the task that leads to the construction of Social Networks of Characters. Indeed, the currently available methods tend to underperform, especially in less-represented languages, due to a lack of manually annotated data for training. Here, we propose a pipeline, which we call Taggus, to extract social networks from literary fiction works in Portuguese. Our results show that compared to readily available State-of-the-Art tools -- off-the-shelf NER tools and Large Language Models (ChatGPT) -- the resulting pipeline, which uses POS tagging and a combination of heuristics, achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.