A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages
Daniel Edmiston

TL;DR
This paper systematically investigates how BERT models encode morphological features across five European languages, revealing that their representations often reflect linguistic structure and specific attention heads focus on grammatical agreement.
Contribution
It provides a detailed analysis of morphological content in BERT models, highlighting how they encode linguistic features and identifying specific attention mechanisms involved.
Findings
Transformer embeddings partition into morphological feature-based regions
Models distinguish ambiguous morphological forms in many cases
Certain attention heads focus on subject-verb agreement
Abstract
This work describes experiments which probe the hidden representations of several BERT-style models for morphological content. The goal is to examine the extent to which discrete linguistic structure, in the form of morphological features and feature values, presents itself in the vector representations and attention distributions of pre-trained language models for five European languages. The experiments contained herein show that (i) Transformer architectures largely partition their embedding space into convex sub-regions highly correlated with morphological feature value, (ii) the contextualized nature of transformer embeddings allows models to distinguish ambiguous morphological forms in many, but not all cases, and (iii) very specific attention head/layer combinations appear to hone in on subject-verb agreement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
