Tree Transformers are an Ineffective Model of Syntactic Constituency
Michael Ginn

TL;DR
This paper critically evaluates Tree Transformers, finding they do not learn meaningful syntactic structures nor significantly improve error detection, questioning their effectiveness as models of constituency.
Contribution
The study provides empirical evidence that Tree Transformers do not inherently learn or utilize meaningful syntactic constituency structures.
Findings
Tree Transformers show little evidence of meaningful constituent structures.
Slight performance advantage in error detection tasks is not statistically significant.
Overall, Tree Transformers are ineffective as models of syntactic constituency.
Abstract
Linguists have long held that a key aspect of natural language syntax is the recursive organization of language units into constituent structures, and research has suggested that current state-of-the-art language models lack an inherent bias towards this feature. A number of alternative models have been proposed to provide inductive biases towards constituency, including the Tree Transformer, which utilizes a modified attention mechanism to organize tokens into constituents. We investigate Tree Transformers to study whether they utilize meaningful and/or useful constituent structures. We pretrain a large Tree Transformer on language modeling in order to investigate the learned constituent tree representations of sentences, finding little evidence for meaningful structures. Next, we evaluate Tree Transformers with similar transformer models on error detection tasks requiring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax
