Multilingual person name recognition and transliteration
Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Irina Temnikova,, Anna Widiger, Wajdi Zaghouani, Jan Zizka

TL;DR
This paper introduces a multilingual tool for extracting, matching, and analyzing person names across diverse languages and scripts in news collections, enhancing cross-lingual news analysis capabilities.
Contribution
It presents a novel multilingual name matching method that handles various scripts without relying solely on transliteration, integrated into the NewsExplorer system.
Findings
Successfully matches name variants across Greek, Cyrillic, and Arabic scripts.
Processes an average of 25,000 news articles daily for related news detection.
Enhances cross-lingual person name recognition in multilingual news analysis.
Abstract
We present an exploratory tool that extracts person names from multilingual news collections, matches name variants referring to the same person, and infers relationships between people based on the co-occurrence of their names in related news. A novel feature is the matching of name variants across languages and writing systems, including names written with the Greek, Cyrillic and Arabic writing system. Due to our highly multilingual setting, we use an internal standard representation for name representation and matching, instead of adopting the traditional bilingual approach to transliteration. This work is part of the news analysis system NewsExplorer that clusters an average of 25,000 news articles per day to detect related news within the same and across different languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Data Quality and Management · Handwritten Text Recognition Techniques
