Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation
Nikolaos Pappas, Lesly Miculicich Werlen, James Henderson

TL;DR
This paper introduces a flexible joint input-output embedding approach for neural machine translation that generalizes weight tying, capturing semantic structures and sharing weights across contexts to improve translation quality.
Contribution
It proposes a structure-aware output layer that generalizes weight tying, enabling more flexible parameter sharing and better leveraging prior knowledge in NMT models.
Findings
Outperforms strong baselines on English-to-Finnish and English-to-German translation tasks.
Effectively captures semantic structure of output space.
Allows controlled capacity of output layer.
Abstract
Tying the weights of the target word embeddings with the target word classifiers of neural machine translation models leads to faster training and often to better translation quality. Given the success of this parameter sharing, we investigate other forms of sharing in between no sharing and hard equality of parameters. In particular, we propose a structure-aware output layer which captures the semantic structure of the output space of words within a joint input-output embedding. The model is a generalized form of weight tying which shares parameters but allows learning a more flexible relationship with input word embeddings and allows the effective capacity of the output layer to be controlled. In addition, the model shares weights across output classifiers and translation contexts which allows it to better leverage prior knowledge about them. Our evaluation on English-to-Finnish and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
