TL;DR
This paper evaluates whether incorporating dependency tree syntax improves pre-trained transformer models like BERT on information extraction tasks, finding benefits but also highlighting limitations related to dependency parse availability.
Contribution
It introduces and compares two strategies for integrating dependency syntax into transformers and demonstrates their effectiveness on key information extraction tasks.
Findings
Syntax-infused transformers achieve state-of-the-art results on SRL and relation extraction.
Performance gains depend heavily on access to human-annotated dependency parses.
Models show promise but face practical limitations due to reliance on external syntax annotations.
Abstract
Much recent work suggests that incorporating syntax information from dependency trees can improve task-specific transformer models. However, the effect of incorporating dependency tree information into pre-trained transformer models (e.g., BERT) remains unclear, especially given recent studies highlighting how these models implicitly encode syntax. In this work, we systematically study the utility of incorporating dependency trees into pre-trained transformers on three representative information extraction tasks: semantic role labeling (SRL), named entity recognition, and relation extraction. We propose and investigate two distinct strategies for incorporating dependency structure: a late fusion approach, which applies a graph neural network on the output of a transformer, and a joint fusion approach, which infuses syntax structure into the transformer attention layers. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGraph Neural Network
