Tree Transformers are an Ineffective Model of Syntactic Constituency

Michael Ginn

arXiv:2411.16993·cs.CL·November 27, 2024

Tree Transformers are an Ineffective Model of Syntactic Constituency

Michael Ginn

PDF

Open Access

TL;DR

This paper critically evaluates Tree Transformers, finding they do not learn meaningful syntactic structures nor significantly improve error detection, questioning their effectiveness as models of constituency.

Contribution

The study provides empirical evidence that Tree Transformers do not inherently learn or utilize meaningful syntactic constituency structures.

Findings

01

Tree Transformers show little evidence of meaningful constituent structures.

02

Slight performance advantage in error detection tasks is not statistically significant.

03

Overall, Tree Transformers are ineffective as models of syntactic constituency.

Abstract

Linguists have long held that a key aspect of natural language syntax is the recursive organization of language units into constituent structures, and research has suggested that current state-of-the-art language models lack an inherent bias towards this feature. A number of alternative models have been proposed to provide inductive biases towards constituency, including the Tree Transformer, which utilizes a modified attention mechanism to organize tokens into constituents. We investigate Tree Transformers to study whether they utilize meaningful and/or useful constituent structures. We pretrain a large Tree Transformer on language modeling in order to investigate the learned constituent tree representations of sentences, finding little evidence for meaningful structures. Next, we evaluate Tree Transformers with similar transformer models on error detection tasks requiring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax