A Systematic Analysis of Morphological Content in BERT Models for   Multiple Languages

Daniel Edmiston

arXiv:2004.03032·cs.CL·April 8, 2020·30 cites

A Systematic Analysis of Morphological Content in BERT Models for Multiple Languages

Daniel Edmiston

PDF

Open Access 1 Repo

TL;DR

This paper systematically investigates how BERT models encode morphological features across five European languages, revealing that their representations often reflect linguistic structure and specific attention heads focus on grammatical agreement.

Contribution

It provides a detailed analysis of morphological content in BERT models, highlighting how they encode linguistic features and identifying specific attention mechanisms involved.

Findings

01

Transformer embeddings partition into morphological feature-based regions

02

Models distinguish ambiguous morphological forms in many cases

03

Certain attention heads focus on subject-verb agreement

Abstract

This work describes experiments which probe the hidden representations of several BERT-style models for morphological content. The goal is to examine the extent to which discrete linguistic structure, in the form of morphological features and feature values, presents itself in the vector representations and attention distributions of pre-trained language models for five European languages. The experiments contained herein show that (i) Transformer architectures largely partition their embedding space into convex sub-regions highly correlated with morphological feature value, (ii) the contextualized nature of transformer embeddings allows models to distinguish ambiguous morphological forms in many, but not all cases, and (iii) very specific attention head/layer combinations appear to hone in on subject-verb agreement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danedmiston/morphology_classifiers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax