Speeding Up Entmax

Maxat Tezekbayev; Vassilina Nikoulina; Matthias Gall\'e; Zhenisbek; Assylbekov

arXiv:2111.06832·cs.CL·May 20, 2022

Speeding Up Entmax

Maxat Tezekbayev, Vassilina Nikoulina, Matthias Gall\'e, Zhenisbek, Assylbekov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a faster alternative to the $ ext{entmax}$ function that maintains its benefits and improves efficiency in neural network language models, especially for machine translation.

Contribution

The paper proposes a new method that speeds up $ ext{entmax}$, making it as fast as optimized softmax while preserving its advantages in neural network normalization.

Findings

01

Achieves comparable or better translation performance

02

Runs as fast as optimized softmax

03

Maintains the sparse probability distribution benefits

Abstract

Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. $α$ -entmax of Peters et al. (2019, arXiv:1905.05702) solves this problem, but is considerably slower than softmax. In this paper, we propose an alternative to $α$ -entmax, which keeps its virtuous characteristics, but is as fast as optimized softmax and achieves on par or better performance in machine translation task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maxattezekbayev/alpha-relu
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification

MethodsSoftmax