Von Mises-Fisher Loss for Training Sequence to Sequence Models with   Continuous Outputs

Sachin Kumar; Yulia Tsvetkov

arXiv:1812.04616·cs.CL·March 25, 2019·54 cites

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs

Sachin Kumar, Yulia Tsvetkov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel continuous output method for sequence-to-sequence models using a Von Mises-Fisher loss, enabling faster training and handling larger vocabularies without sacrificing translation quality.

Contribution

It proposes a new probabilistic loss and training procedure replacing softmax with a continuous embedding layer for sequence models.

Findings

01

Achieves up to 2.5x training speed-up.

02

Performs comparably to state-of-the-art in translation quality.

03

Handles very large vocabularies effectively.

Abstract

The Softmax function is used in the final layer of nearly all existing sequence-to-sequence models for language generation. However, it is usually the slowest layer to compute which limits the vocabulary size to a subset of most frequent types; and it has a large memory footprint. We propose a general technique for replacing the softmax layer with a continuous embedding layer. Our primary innovations are a novel probabilistic loss, and a training and inference procedure in which we generate a probability distribution over pre-trained word embeddings, instead of a multinomial distribution over the vocabulary obtained via softmax. We evaluate this new class of sequence-to-sequence models with continuous outputs on the task of neural machine translation. We show that our models obtain upto 2.5x speed-up in training time while performing on par with the state-of-the-art models in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Sachin19/seq2seq-con
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsSoftmax