Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis applications
Ralf Steinberger, Bruno Pouliquen, Camelia Ignat (European Commission, - Joint Research Centre)

TL;DR
This paper presents a straightforward, efficient method leveraging multilingual resources and language-independent features to enhance cross-lingual text analysis across many languages with minimal effort.
Contribution
It introduces a novel approach that uses existing multilingual resources and language-independent features for scalable cross-lingual applications.
Findings
Effective for multiple languages without extensive adaptation
Improves cross-lingual document retrieval and clustering
Utilizes existing linguistic resources and language-independent tokens
Abstract
We are proposing a simple, but efficient basic approach for a number of multilingual and cross-lingual language technology applications that are not limited to the usual two or three languages, but that can be applied with relatively little effort to larger sets of languages. The approach consists of using existing multilingual linguistic resources such as thesauri, nomenclatures and gazetteers, as well as exploiting the existence of additional more or less language-independent text items such as dates, currency expressions, numbers, names and cognates. Mapping texts onto the multilingual resources and identifying word token links between texts in different languages are basic ingredients for applications such as cross-lingual document similarity calculation, multilingual clustering and categorisation, cross-lingual document retrieval, and tools to provide cross-lingual information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques · Semantic Web and Ontologies
