TL;DR
Generalized word shift graphs provide an interpretable visualization method for understanding detailed differences between texts across various measures, enhancing analysis beyond simple aggregate scores.
Contribution
The paper introduces generalized word shift graphs, a versatile visualization framework that captures fine-grained textual differences for any measure expressed as a weighted average.
Findings
Encompasses common text comparison methods like frequency and entropy measures.
Demonstrates application across multiple domains through case studies.
Facilitates diagnostic, hypothesis-driven, and interpretative analysis.
Abstract
A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts' rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement validity. To better capture fine-grained differences between texts, we introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
