A Semi-Automated Approach for Information Extraction, Classification and Analysis of Unstructured Data
Alberto Purpura, Marco Calaresu

TL;DR
This paper presents a semi-automated method combining NLP techniques and Quantitative Narrative Analysis to extract, categorize, and analyze unstructured data from a political diary, enabling insights into soft-power influence.
Contribution
It introduces an innovative framework integrating NLP and Quantitative Narrative Analysis for structured data extraction and analysis from unstructured political documents.
Findings
Effective extraction of data using Regular Expressions and NER.
Structured data organization enables detailed political influence analysis.
Visualization with PC-ACE enhances interpretability of results.
Abstract
In this paper, we show how Quantitative Narrative Analysis and simple Natural Language Processing techniques apply to the extraction and categorization of data in a sample case study of the Diary of the former President of the Italian Republic (PoR), Giorgio Napolitano. The Diary contains a record of all his institutional meetings. This information, if properly handled, allows for an analysis of how the PoR used his so-called soft-powers to influence the Italian political system during his first mandate. In this paper, we propose a way to use simple, yet very effective, Natural Language Processing techniques - such as Regular Expressions and Named Entity Recognition - to extract information from the Diary. Then, we propose an innovative way to organize the extracted data relying on the methodological framework of Quantitative Narrative Analysis. Finally, we show how to analyze the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Advanced Text Analysis Techniques
