Speeding Up Entmax
Maxat Tezekbayev, Vassilina Nikoulina, Matthias Gall\'e, Zhenisbek, Assylbekov

TL;DR
This paper introduces a faster alternative to the $ ext{entmax}$ function that maintains its benefits and improves efficiency in neural network language models, especially for machine translation.
Contribution
The paper proposes a new method that speeds up $ ext{entmax}$, making it as fast as optimized softmax while preserving its advantages in neural network normalization.
Findings
Achieves comparable or better translation performance
Runs as fast as optimized softmax
Maintains the sparse probability distribution benefits
Abstract
Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. -entmax of Peters et al. (2019, arXiv:1905.05702) solves this problem, but is considerably slower than softmax. In this paper, we propose an alternative to -entmax, which keeps its virtuous characteristics, but is as fast as optimized softmax and achieves on par or better performance in machine translation task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
MethodsSoftmax
