Sparse Continuous Distributions and Fenchel-Young Losses
Andr\'e F. T. Martins, Marcos Treviso, Ant\'onio Farinhas, Pedro M. Q., Aguiar, M\'ario A. T. Figueiredo, Mathieu Blondel, Vlad Niculae

TL;DR
This paper introduces sparse continuous distributions and Fenchel-Young losses for arbitrary domains, enabling flexible modeling with support varying from finite to infinite, and applies these to attention mechanisms in machine learning.
Contribution
It develops a general framework for sparse distributions over continuous domains using Fenchel-Young losses and regularizers, extending exponential families and enabling new attention models.
Findings
Effective attention over time intervals in audio classification.
Compact region attention in visual question answering.
Closed-form expressions for variances and entropies of new distributions.
Abstract
Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, -entmax, and fusedmax), has led to distributions with varying support. This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define -regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
MethodsSparsemax · Softmax
