Linguistic Dependencies and Statistical Dependence

Jacob Louis Hoover; Alessandro Sordoni; Wenyu Du; Timothy J. O'Donnell

arXiv:2104.08685·cs.CL·May 2, 2022

Linguistic Dependencies and Statistical Dependence

Jacob Louis Hoover, Alessandro Sordoni, Wenyu Du, Timothy J. O'Donnell

PDF

1 Repo

TL;DR

This paper investigates the relationship between linguistic dependencies and statistical dependence using large pretrained language models to estimate pointwise mutual information, revealing moderate correlation and differences across models.

Contribution

It introduces the use of contextualized PMI from large language models to analyze dependency trees and compares them to linguistic gold standards across languages.

Findings

01

CPMI-based dependencies achieve about 0.5 unlabelled attachment score.

02

CPMI dependencies outperform non-contextualized PMI baselines.

03

Different pretrained models capture different types of linguistic dependencies.

Abstract

Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we contribute an extensive analysis of the relationship between linguistic dependencies and statistical dependence between words. Improving on previous work, we introduce the use of large pretrained language models to compute contextualized estimates of the pointwise mutual information between words (CPMI). For multiple models and languages, we extract dependency trees which maximize CPMI, and compare to gold standard linguistic dependencies. Overall, we find that CPMI dependencies achieve an unlabelled undirected attachment score of at most $\approx 0.5$ . While far above chance, and consistently above a non-contextualized PMI baseline, this score is generally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcqll/cpmi-dependencies
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.