Can I guess where you are from? Modeling dialectal morphosyntactic similarities in Brazilian Portuguese
Manoel Siqueira, Raquel Freitag

TL;DR
This study models dialectal morphosyntactic similarities in Brazilian Portuguese to determine if dialectal origins can be inferred from linguistic variable behaviors, using correlation and clustering methods.
Contribution
It demonstrates that clustering methods better reveal regional dialectal patterns than simple correlation, emphasizing interdisciplinary approaches in language technology.
Findings
Clustering reveals regional dialectal groupings.
Correlation captures limited pairwise associations.
Interdisciplinary methods are crucial for inclusive language tech.
Abstract
This paper investigates morphosyntactic covariation in Brazilian Portuguese (BP) to assess whether dialectal origin can be inferred from the combined behavior of linguistic variables. Focusing on four grammatical phenomena related to pronouns, correlation and clustering methods are applied to model covariation and dialectal distribution. The results indicate that correlation captures only limited pairwise associations, whereas clustering reveals speaker groupings that reflect regional dialectal patterns. Despite the methodological constraints imposed by differences in sample size requirements between sociolinguistics and computational approaches, the study highlights the importance of interdisciplinary research. Developing fair and inclusive language technologies that respect dialectal diversity outweighs the challenges of integrating these fields.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
