Natural Language Processing Chains Inside a Cross-lingual Event-Centric Knowledge Pipeline for European Union Under-resourced Languages
Diego Alves, Gaurish Thakkar, Marko Tadi\'c

TL;DR
This paper develops a multilingual NLP pipeline for EU languages, focusing on event-centric media analysis, by creating language processing chains that address resource disparities among languages.
Contribution
It introduces a scalable strategy for building NLP processing chains for both resource-rich and under-resourced EU languages within an event-centric knowledge platform.
Findings
Created processing chains for well-resourced languages
Developed new modules for under-resourced languages
Analyzed language resource availability across EU languages
Abstract
This article presents the strategy for developing a platform containing Language Processing Chains for European Union languages, consisting of Tokenization to Parsing, also including Named Entity recognition andwith addition ofSentiment Analysis. These chains are part of the first step of an event-centric knowledge processing pipeline whose aim is to process multilingual media information about major events that can cause an impactin Europe and the rest of the world. Due to the differences in terms of availability of language resources for each language, we have built this strategy in three steps, starting with processing chains for the well-resourced languages and finishing with the development of new modules for the under-resourced ones. In order to classify all European Union official languages in terms of resources, we have analysed the size of annotated corpora as well as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
