Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks
Othman Zennaki, Nasredine Semmar, Laurent Besacier

TL;DR
This paper presents a novel RNN-based method for developing multilingual linguistic annotation tools for resource-poor languages using only parallel corpora, without requiring word alignment or language knowledge.
Contribution
It introduces a cross-lingual annotation projection approach that does not rely on word alignment or language-specific info, enabling broad applicability to resource-scarce languages.
Findings
Effective cross-lingual POS tagging achieved
Super sense taggers successfully induced across languages
Method works with both manual and automatic translations
Abstract
This work focuses on the rapid development of linguistic annotation tools for resource-poor languages. We experiment several cross-lingual annotation projection methods using Recurrent Neural Networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between the source and target language. More precisely, our method has the following characteristics: (a) it does not use word alignment information, (b) it does not assume any knowledge about foreign languages, which makes it applicable to a wide range of resource-poor languages, (c) it provides truly multilingual taggers. We investigate both uni- and bi-directional RNN models and propose a method to include external information (for instance low level information from POS) in the RNN to train higher level taggers (for instance, super sense taggers). We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · ICT in Developing Communities · Multilingual Education and Policy
