Modeling Global Syntactic Variation in English Using Dialect   Classification

Jonathan Dunn

arXiv:1904.05527·cs.CL·April 12, 2019·1 cites

Modeling Global Syntactic Variation in English Using Dialect Classification

Jonathan Dunn

PDF

Open Access

TL;DR

This paper investigates global dialect classification of 14 English varieties using data-driven methods and grammar induction to analyze syntactic variation across different corpora and registers.

Contribution

It introduces a data-driven approach for selecting dialects, employs grammar induction for feature extraction, and compares model robustness across web and social media data.

Findings

01

Dialect classification accuracy varies across registers.

02

Grammar induction yields a large set of syntactic features.

03

Models show consistent syntactic variation across corpora.

Abstract

This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Linguistic Variation and Morphology