Parsing as Pretraining
David Vilares, Michalina Strzyz, Anders S{\o}gaard, Carlos, G\'omez-Rodr\'iguez

TL;DR
This paper demonstrates that pretraining architectures can be used directly for full parsing tasks in English without decoding, achieving state-of-the-art results by casting parsing as sequence tagging and analyzing syntax sensitivity.
Contribution
It introduces a novel approach to full parsing using only pretrained encoders and a simple feed-forward layer, bypassing traditional decoding methods.
Findings
Surpasses existing sequence tagging parsers on PTB with 93.5% F1
Achieves 78.8% LAS on end-to-end EN-EWT UD
Analyzes syntax-sensitivity of different word vectors
Abstract
Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures -- and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and LAS, and analyze in-depth differences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
