A Regularized Framework for Sparse and Structured Neural Attention
Vlad Niculae, Mathieu Blondel

TL;DR
This paper introduces a new regularized framework for sparse and structured neural attention mechanisms that enhance interpretability and performance across various NLP tasks, by leveraging a smoothed max operator and structured penalties.
Contribution
The authors propose a novel attention framework based on a smoothed max operator, incorporating structured penalties for interpretability, with efficient algorithms and demonstrated improvements in NLP tasks.
Findings
Improved interpretability of attention mechanisms.
Outperforms softmax and sparsemax in textual entailment and summarization.
Efficient algorithms for forward and backward passes.
Abstract
Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input. We propose in this paper a new framework for sparse and structured attention, building upon a smoothed max operator. We show that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Our framework includes softmax and a slight generalization of the recently-proposed sparsemax as special cases. However, we also show how our framework can incorporate modern structured penalties, resulting in more interpretable attention mechanisms, that focus on entire segments or groups of an input. We derive efficient algorithms to compute the forward and backward passes of our attention mechanisms, enabling their use in a neural network trained with backpropagation. To showcase their potential as a drop-in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
MethodsInterpretability · Softmax
