Yall should read this! Identifying Plurality in Second-Person Personal Pronouns in English Texts
Gabriel Stanovsky, Ronen Tamari

TL;DR
This paper addresses the challenge of distinguishing singular and plural 'you' in English texts, leveraging dialectal and multilingual cues to improve natural language processing tasks like translation and coreference resolution.
Contribution
It introduces a large-scale distantly-supervised dataset for plural vs. singular 'you' detection and trains a model demonstrating reasonable in-domain accuracy with notable domain transfer challenges.
Findings
In-domain accuracy exceeds 77%
Domain transfer remains highly challenging
Publicly available code and data
Abstract
Distinguishing between singular and plural "you" in English is a challenging task which has potential for downstream applications, such as machine translation or coreference resolution. While formal written English does not distinguish between these cases, other languages (such as Spanish), as well as other dialects of English (via phrases such as "yall"), do make this distinction. We make use of this to obtain distantly-supervised labels for the task on a large-scale in two domains. Following, we train a model to distinguish between the single/plural you, finding that although in-domain training achieves reasonable accuracy (over 77%), there is still a lot of room for improvement, especially in the domain-transfer scenario, which proves extremely challenging. Our code and data are publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
