Dialectograms: Machine Learning Differences between Discursive Communities
Thyge Enggaard (1), August Lohse (1), Morten Axel Pedersen (1, 2),, Sune Lehmann (1, 3) ((1) Copenhagen Center for Social Data Science,, University of Copenhagen, Denmark, (2) Department of Anthropology, University, of Copenhagen, Denmark, (3) DTU Compute

TL;DR
This paper introduces dialectograms, an unsupervised visualization method that maps how different communities use words distinctly, revealing nuanced differences in discourse beyond simple word usage variations.
Contribution
It presents a novel visualization technique and a new measure for analyzing differences in word usage across communities using full embedding spaces.
Findings
Revealed affective polarization in political discourse
Identified differences in perceptions of political actions
Mapped community-specific word usage patterns
Abstract
Word embeddings provide an unsupervised way to understand differences in word usage between discursive communities. A number of recent papers have focused on identifying words that are used differently by two or more communities. But word embeddings are complex, high-dimensional spaces and a focus on identifying differences only captures a fraction of their richness. Here, we take a step towards leveraging the richness of the full embedding space, by using word embeddings to map out how words are used differently. Specifically, we describe the construction of dialectograms, an unsupervised way to visually explore the characteristic ways in which each community use a focal word. Based on these dialectograms, we provide a new measure of the degree to which words are used differently that overcomes the tendency for existing measures to pick out low frequent or polysemous words. We apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Media and Politics · Misinformation and Its Impacts · Hate Speech and Cyberbullying Detection
