Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent   Neural Networks

Othman Zennaki; Nasredine Semmar; Laurent Besacier

arXiv:1609.09382·cs.CL·September 30, 2016

Inducing Multilingual Text Analysis Tools Using Bidirectional Recurrent Neural Networks

Othman Zennaki, Nasredine Semmar, Laurent Besacier

PDF

Open Access

TL;DR

This paper presents a novel RNN-based method for developing multilingual linguistic annotation tools for resource-poor languages using only parallel corpora, without requiring word alignment or language knowledge.

Contribution

It introduces a cross-lingual annotation projection approach that does not rely on word alignment or language-specific info, enabling broad applicability to resource-scarce languages.

Findings

01

Effective cross-lingual POS tagging achieved

02

Super sense taggers successfully induced across languages

03

Method works with both manual and automatic translations

Abstract

This work focuses on the rapid development of linguistic annotation tools for resource-poor languages. We experiment several cross-lingual annotation projection methods using Recurrent Neural Networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between the source and target language. More precisely, our method has the following characteristics: (a) it does not use word alignment information, (b) it does not assume any knowledge about foreign languages, which makes it applicable to a wide range of resource-poor languages, (c) it provides truly multilingual taggers. We investigate both uni- and bi-directional RNN models and propose a method to include external information (for instance low level information from POS) in the RNN to train higher level taggers (for instance, super sense taggers). We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · ICT in Developing Communities · Multilingual Education and Policy