TL;DR
This paper investigates the impact of different design choices in transformer-based dependency parsers, finding that pre-trained embeddings are most influential, LSTMs are unnecessary, and a simple architecture can achieve state-of-the-art results across multiple languages.
Contribution
It introduces STEPS, a modular dependency parser, and provides an analysis of design choices, emphasizing the importance of pre-trained embeddings and simplicity for multilingual parsing.
Findings
Pre-trained embeddings, especially XLM-R, greatly improve parser performance.
Adding LSTM layers offers no benefit with transformer embeddings.
A simple architecture achieves state-of-the-art results in most tested languages.
Abstract
The introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various dimensions, including their choice of pre-trained language models and whether they use LSTM layers. With the aims of disentangling the effects of these choices and identifying a simple yet widely applicable architecture, we introduce STEPS, a new modular graph-based dependency parser. Using STEPS, we perform a series of analyses on the UD corpora of a diverse set of languages. We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study. Adding LSTM layers provides no benefits when using transformer-based embeddings. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsXLM-R · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
