Finding Variants for Construction-Based Dialectometry: A Corpus-Based Approach to Regional CxGs
Jonathan Dunn

TL;DR
This paper introduces a construction-based dialectometry method that uses grammar induction to identify regional variations in constructions, enabling accurate measurement of dialectal differences in English.
Contribution
It presents a novel corpus-based approach that learns a grammar of constructions for dialectometry, capturing regional variation without predefined construction sets.
Findings
The method accurately distinguishes regional varieties of English.
The learned grammar maintains stable quality across datasets.
It effectively measures the degree of regional variation for constructions.
Abstract
This paper develops a construction-based dialectometry capable of identifying previously unknown constructions and measuring the degree to which a given construction is subject to regional variation. The central idea is to learn a grammar of constructions (a CxG) using construction grammar induction and then to use these constructions as features for dialectometry. This offers a method for measuring the aggregate similarity between regional CxGs without limiting in advance the set of constructions subject to variation. The learned CxG is evaluated on how well it describes held-out test corpora while dialectometry is evaluated on how well it can model regional varieties of English. Themethod is tested using two distinct datasets: First, the International Corpus of English representing eight outer circle varieties; Second, a web-crawled corpus representing five inner circle varieties.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
