TL;DR
This paper introduces MHAL, a neural network architecture that uses multi-head attention to jointly classify text at different hierarchical levels, effectively capturing compositional language features and enabling zero-shot word-level tasks.
Contribution
The paper presents a novel multi-head attention-based model that explicitly connects hierarchical linguistic components for improved multi-level text classification.
Findings
MHAL outperforms non-hierarchical models on classification tasks.
The model enables zero-shot word-level classification without explicit supervision.
Information flows naturally from sentence to word representations.
Abstract
In natural languages, words are used in association to construct sentences. It is not words in isolation, but the appropriate combination of hierarchical structures that conveys the meaning of the whole sentence. Neural networks can capture expressive language features; however, insights into the link between words and sentences are difficult to acquire automatically. In this work, we design a deep neural network architecture that explicitly wires lower and higher linguistic components; we then evaluate its ability to perform the same task at different hierarchical levels. Settling on broad text classification tasks, we show that our model, MHAL, learns to simultaneously solve them at different levels of granularity by fluidly transferring knowledge between hierarchies. Using a multi-head attention mechanism to tie the representations between single words and full sentences, MHAL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention
