TL;DR
This paper investigates the relationship between linguistic dependencies and statistical dependence using large pretrained language models to estimate pointwise mutual information, revealing moderate correlation and differences across models.
Contribution
It introduces the use of contextualized PMI from large language models to analyze dependency trees and compares them to linguistic gold standards across languages.
Findings
CPMI-based dependencies achieve about 0.5 unlabelled attachment score.
CPMI dependencies outperform non-contextualized PMI baselines.
Different pretrained models capture different types of linguistic dependencies.
Abstract
Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we contribute an extensive analysis of the relationship between linguistic dependencies and statistical dependence between words. Improving on previous work, we introduce the use of large pretrained language models to compute contextualized estimates of the pointwise mutual information between words (CPMI). For multiple models and languages, we extract dependency trees which maximize CPMI, and compare to gold standard linguistic dependencies. Overall, we find that CPMI dependencies achieve an unlabelled undirected attachment score of at most . While far above chance, and consistently above a non-contextualized PMI baseline, this score is generally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
