Constituency Parsing with a Self-Attentive Encoder

Nikita Kitaev; Dan Klein

arXiv:1805.01052·cs.CL·May 4, 2018·44 cites

Constituency Parsing with a Self-Attentive Encoder

Nikita Kitaev, Dan Klein

PDF

Open Access 5 Repos

TL;DR

This paper introduces a self-attentive encoder for constituency parsing, replacing LSTMs, leading to improved accuracy and better interpretability, setting new state-of-the-art results on multiple datasets.

Contribution

The paper presents a novel self-attentive encoder for constituency parsing that outperforms LSTM-based models and provides insights into information propagation and model improvements.

Findings

01

Achieved 93.55 F1 on Penn Treebank without external data

02

Achieved 95.13 F1 with pre-trained word representations

03

Outperformed previous models on 8 of 9 languages in SPMRL dataset

Abstract

We demonstrate that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser. The use of attention makes explicit the manner in which information is propagated between different locations in the sentence, which we use to both analyze our model and propose potential improvements. For example, we find that separating positional and content information in the encoder can lead to improved parsing accuracy. Additionally, we evaluate different approaches for lexical representation. Our parser achieves new state-of-the-art results for single models trained on the Penn Treebank: 93.55 F1 without the use of any external data, and 95.13 F1 when using pre-trained word representations. Our parser also outperforms the previous best-published accuracy figures on 8 of the 9 languages in the SPMRL dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory