Applying Occam's Razor to Transformer-Based Dependency Parsing: What   Works, What Doesn't, and What is Really Necessary

Stefan Gr\"unewald; Annemarie Friedrich; Jonas Kuhn

arXiv:2010.12699·cs.CL·July 30, 2021

Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary

Stefan Gr\"unewald, Annemarie Friedrich, Jonas Kuhn

PDF

2 Repos

TL;DR

This paper investigates the impact of different design choices in transformer-based dependency parsers, finding that pre-trained embeddings are most influential, LSTMs are unnecessary, and a simple architecture can achieve state-of-the-art results across multiple languages.

Contribution

It introduces STEPS, a modular dependency parser, and provides an analysis of design choices, emphasizing the importance of pre-trained embeddings and simplicity for multilingual parsing.

Findings

01

Pre-trained embeddings, especially XLM-R, greatly improve parser performance.

02

Adding LSTM layers offers no benefit with transformer embeddings.

03

A simple architecture achieves state-of-the-art results in most tested languages.

Abstract

The introduction of pre-trained transformer-based contextualized word embeddings has led to considerable improvements in the accuracy of graph-based parsers for frameworks such as Universal Dependencies (UD). However, previous works differ in various dimensions, including their choice of pre-trained language models and whether they use LSTM layers. With the aims of disentangling the effects of these choices and identifying a simple yet widely applicable architecture, we introduce STEPS, a new modular graph-based dependency parser. Using STEPS, we perform a series of analyses on the UD corpora of a diverse set of languages. We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study. Adding LSTM layers provides no benefits when using transformer-based embeddings. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsXLM-R · Tanh Activation · Sigmoid Activation · Long Short-Term Memory