A Regularized Framework for Sparse and Structured Neural Attention

Vlad Niculae; Mathieu Blondel

arXiv:1705.07704·stat.ML·February 26, 2019·46 cites

A Regularized Framework for Sparse and Structured Neural Attention

Vlad Niculae, Mathieu Blondel

PDF

Open Access 3 Repos

TL;DR

This paper introduces a new regularized framework for sparse and structured neural attention mechanisms that enhance interpretability and performance across various NLP tasks, by leveraging a smoothed max operator and structured penalties.

Contribution

The authors propose a novel attention framework based on a smoothed max operator, incorporating structured penalties for interpretability, with efficient algorithms and demonstrated improvements in NLP tasks.

Findings

01

Improved interpretability of attention mechanisms.

02

Outperforms softmax and sparsemax in textual entailment and summarization.

03

Efficient algorithms for forward and backward passes.

Abstract

Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input. We propose in this paper a new framework for sparse and structured attention, building upon a smoothed max operator. We show that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Our framework includes softmax and a slight generalization of the recently-proposed sparsemax as special cases. However, we also show how our framework can incorporate modern structured penalties, resulting in more interpretable attention mechanisms, that focus on entire segments or groups of an input. We derive efficient algorithms to compute the forward and backward passes of our attention mechanisms, enabling their use in a neural network trained with backpropagation. To showcase their potential as a drop-in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning

MethodsInterpretability · Softmax