CommunityFish: A Poisson-based Document Scaling With Hierarchical Clustering
Sami Diaf

TL;DR
CommunityFish enhances document scaling by integrating hierarchical clustering of word communities into the Wordfish model, improving interpretability and performance in political text analysis.
Contribution
It introduces a novel method combining hierarchical clustering with Poisson-based scaling to identify semantic word communities for better text analysis.
Findings
Outperforms classic Wordfish in political text analysis
Reveals historical developments in US State of the Union addresses
Replicates political stances in German legislative manifestos
Abstract
Document scaling has been a key component in text-as-data applications for social scientists and a major field of interest for political researchers, who aim at uncovering differences between speakers or parties with the help of different probabilistic and non-probabilistic approaches. Yet, most of these techniques are either built upon the agnostically bag-of-word hypothesis or use prior information borrowed from external sources that might embed the results with a significant bias. If the corpus has long been considered as a collection of documents, it can also be seen as a dense network of connected words whose structure could be clustered to differentiate independent groups of words, based on their co-occurrences in documents, known as communities. This paper introduces CommunityFish as an augmented version of Wordfish based on a hierarchical clustering, namely the Louvain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence · Complex Network Analysis Techniques · Social Media and Politics
