MultiMax: Sparse and Multi-Modal Attention Learning

Yuxuan Zhou; Mario Fritz; Margret Keuper

arXiv:2406.01189·cs.LG·January 9, 2025

MultiMax: Sparse and Multi-Modal Attention Learning

Yuxuan Zhou, Mario Fritz, Margret Keuper

PDF

Open Access 1 Repo

TL;DR

MultiMax introduces a novel piece-wise differentiable function that enhances sparsity and multi-modality in attention mechanisms, improving interpretability and performance across various machine learning tasks.

Contribution

It proposes MultiMax, a new function that balances sparsity and multi-modality, overcoming limitations of SoftMax variants in neural attention models.

Findings

01

MultiMax effectively suppresses irrelevant entries in distributions.

02

It preserves multi-modality better than SoftMax variants.

03

Demonstrated improvements in image classification, language modeling, and machine translation.

Abstract

SoftMax is a ubiquitous ingredient of modern machine learning algorithms. It maps an input vector onto a probability simplex and reweights the input by concentrating the probability mass at large entries. Yet, as a smooth approximation to the Argmax function, a significant amount of probability mass is distributed to other, residual entries, leading to poor interpretability and noise. Although sparsity can be achieved by a family of SoftMax variants, they often require an alternative loss function and do not preserve multi-modality. We show that this trade-off between multi-modality and sparsity limits the expressivity of SoftMax as well as its variants. We provide a solution to this tension between objectives by proposing a piece-wise differentiable function, termed MultiMax, which adaptively modulates the output distribution according to input entry range. Through comprehensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhouyuxuanyx/multimax
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsSoftmax