Dependency Parsing is More Parameter-Efficient with Normalization

Paolo Gajo; Domenic Rosati; Hassan Sajjad; Alberto Barr\'on-Cede\~no

arXiv:2505.20215·cs.CL·October 27, 2025

Dependency Parsing is More Parameter-Efficient with Normalization

Paolo Gajo, Domenic Rosati, Hassan Sajjad, Alberto Barr\'on-Cede\~no

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that applying normalization to biaffine scoring in dependency parsing models reduces overparameterization, improves efficiency, and achieves state-of-the-art results across multiple languages and tasks.

Contribution

The paper provides theoretical and empirical evidence that score normalization in biaffine parsing models enhances parameter efficiency and performance.

Findings

01

Normalization reduces model overparameterization.

02

Normalized models achieve state-of-the-art accuracy.

03

Normalization improves training efficiency across tasks.

Abstract

Dependency parsing is the task of inferring natural language structure, often approached by modeling word interactions via attention through biaffine scoring. This mechanism works like self-attention in Transformers, where scores are calculated for every pair of words in a sentence. However, unlike Transformer attention, biaffine scoring does not use normalization prior to taking the softmax of the scores. In this paper, we provide theoretical evidence and empirical results revealing that a lack of normalization necessarily results in overparameterized parser models, where the extra parameters compensate for the sharp softmax outputs produced by high variance inputs to the biaffine scoring function. We argue that biaffine scoring can be made substantially more efficient by performing score normalization. We conduct experiments on semantic and syntactic dependency parsing in multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Dependency Parsing is More Parameter-Efficient with Normalization· slideslive

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing