Seeing Both the Forest and the Trees: Multi-head Attention for Joint   Classification on Different Compositional Levels

Miruna Pislar; Marek Rei

arXiv:2011.00470·cs.CL·November 3, 2020

Seeing Both the Forest and the Trees: Multi-head Attention for Joint Classification on Different Compositional Levels

Miruna Pislar, Marek Rei

PDF

1 Repo

TL;DR

This paper introduces MHAL, a neural network architecture that uses multi-head attention to jointly classify text at different hierarchical levels, effectively capturing compositional language features and enabling zero-shot word-level tasks.

Contribution

The paper presents a novel multi-head attention-based model that explicitly connects hierarchical linguistic components for improved multi-level text classification.

Findings

01

MHAL outperforms non-hierarchical models on classification tasks.

02

The model enables zero-shot word-level classification without explicit supervision.

03

Information flows naturally from sentence to word representations.

Abstract

In natural languages, words are used in association to construct sentences. It is not words in isolation, but the appropriate combination of hierarchical structures that conveys the meaning of the whole sentence. Neural networks can capture expressive language features; however, insights into the link between words and sentences are difficult to acquire automatically. In this work, we design a deep neural network architecture that explicitly wires lower and higher linguistic components; we then evaluate its ability to perform the same task at different hierarchical levels. Settling on broad text classification tasks, we show that our model, MHAL, learns to simultaneously solve them at different levels of granularity by fluidly transferring knowledge between hierarchies. Using a multi-head attention mechanism to tie the representations between single words and full sentences, MHAL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MirunaPislar/multi-head-attention-labeller
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention