On the Evolution of Syntactic Information Encoded by BERT's Contextualized Representations
Laura P\'erez-Mayos, Roberto Carlini, Miguel Ballesteros, Leo Wanner

TL;DR
This paper investigates how BERT's syntactic information, such as syntax trees, evolves during fine-tuning across various NLP tasks, revealing that syntactic knowledge can be forgotten, reinforced, or preserved depending on the task.
Contribution
It provides a detailed analysis of the evolution of syntactic information in BERT during fine-tuning across multiple tasks, highlighting task-dependent changes.
Findings
Syntactic information is forgotten in PoS tagging tasks.
Syntactic information is reinforced in dependency and constituency parsing.
Semantic tasks tend to preserve syntactic information.
Abstract
The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among other information, it has been shown that entire syntax trees are implicitly embedded in the geometry of such models. As these models are often fine-tuned, it becomes increasingly important to understand how the encoded knowledge evolves along the fine-tuning. In this paper, we analyze the evolution of the embedded syntax trees along the fine-tuning process of BERT for six different tasks, covering all levels of the linguistic structure. Experimental results show that the encoded syntactic information is forgotten (PoS tagging), reinforced (dependency and constituency parsing) or preserved (semantics-related tasks) in different ways along the fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsLinear Layer · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · WordPiece · Attention Is All You Need · Residual Connection · Dense Connections · Adam · Linear Warmup With Linear Decay
