From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label   Classification

Andr\'e F. T. Martins; Ram\'on Fernandez Astudillo

arXiv:1602.02068·cs.CL·February 9, 2016·263 cites

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

Andr\'e F. T. Martins, Ram\'on Fernandez Astudillo

PDF

Open Access 5 Repos

TL;DR

This paper introduces sparsemax, a novel activation function that produces sparse probability distributions, along with a new loss function, demonstrating promising results in multi-label classification and attention mechanisms with more selective focus.

Contribution

The paper presents sparsemax, a new activation function that yields sparse outputs, and a corresponding loss function, enhancing attention models and multi-label classification.

Findings

01

Sparsemax achieves similar accuracy to softmax with sparser outputs.

02

The new loss function relates to the Huber loss, offering robustness.

03

Empirical results show improved attention focus and classification performance.

Abstract

We propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities. After deriving its properties, we show how its Jacobian can be efficiently computed, enabling its use in a network trained with backpropagation. Then, we propose a new smooth and convex loss function which is the sparsemax analogue of the logistic loss. We reveal an unexpected connection between this new loss and the Huber classification loss. We obtain promising empirical results in multi-label classification problems and in attention-based neural networks for natural language inference. For the latter, we achieve a similar performance as the traditional softmax, but with a selective, more compact, attention focus.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Machine Learning and Algorithms

MethodsSparsemax