Sentence Ambiguity, Grammaticality and Complexity Probes

Sunit Bhattacharya; Vil\'em Zouhar; Ond\v{r}ej Bojar

arXiv:2210.06928·cs.CL·October 18, 2022

Sentence Ambiguity, Grammaticality and Complexity Probes

Sunit Bhattacharya, Vil\'em Zouhar, Ond\v{r}ej Bojar

PDF

Open Access 1 Repo

TL;DR

This paper investigates how large pre-trained language models encode linguistic traits like ambiguity, grammaticality, and complexity, highlighting methodological considerations and the localization of features within model layers.

Contribution

It provides a systematic analysis of probing methods for linguistic traits in language models, emphasizing the importance of proper dataset design and interpretation of representations.

Findings

01

Template-based datasets with surface artifacts are unreliable for probing.

02

Careful baseline comparisons are essential for valid conclusions.

03

Features are often localized in specific layers and can be lost in upper layers.

Abstract

It is unclear whether, how and where large pre-trained language models capture subtle linguistic traits like ambiguity, grammaticality and sentence complexity. We present results of automatic classification of these traits and compare their viability and patterns across representation types. We demonstrate that template-based datasets with surface-level artifacts should not be used for probing, careful comparisons with baselines should be done and that t-SNE plots should not be used to determine the presence of a feature among dense vectors representations. We also show how features might be highly localized in the layers for these models and get lost in the upper layers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ufal/ambiguity-grammaticality-complexity
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution