On the Linguistic Representational Power of Neural Machine Translation   Models

Yonatan Belinkov; Nadir Durrani; Fahim Dalvi; Hassan Sajjad; James; Glass

arXiv:1911.00317·cs.CL·November 4, 2019

On the Linguistic Representational Power of Neural Machine Translation Models

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi, Hassan Sajjad, James, Glass

PDF

TL;DR

This paper investigates how neural machine translation models encode linguistic features like morphology, syntax, and semantics across different layers, units, and multilingual settings, revealing their capacity to learn complex language representations.

Contribution

It provides a comprehensive, data-driven analysis of the linguistic information captured by NMT models, highlighting differences across layers, units, and multilingual versus bilingual models.

Findings

01

Lower layers capture morphology and POS information.

02

Higher layers encode semantics and long-range dependencies.

03

Character-based representations better capture morphology.

Abstract

Despite the recent success of deep neural networks in natural language processing (NLP), their interpretability remains a challenge. We analyze the representations learned by neural machine translation models at various levels of granularity and evaluate their quality through relevant extrinsic properties. In particular, we seek answers to the following questions: (i) How accurately is word-structure captured within the learned representations, an important aspect in translating morphologically-rich languages? (ii) Do the representations capture long-range dependencies, and effectively handle syntactically divergent languages? (iii) Do the representations capture lexical semantics? We conduct a thorough investigation along several parameters: (i) Which layers in the architecture capture each of these linguistic phenomena; (ii) How does the choice of translation unit (word, character, or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability