Generalized Word Shift Graphs: A Method for Visualizing and Explaining   Pairwise Comparisons Between Texts

Ryan J. Gallagher; Morgan R. Frank; Lewis Mitchell; Aaron J. Schwartz,; Andrew J. Reagan; Christopher M. Danforth; Peter Sheridan Dodds

arXiv:2008.02250·cs.CL·February 5, 2021

Generalized Word Shift Graphs: A Method for Visualizing and Explaining Pairwise Comparisons Between Texts

Ryan J. Gallagher, Morgan R. Frank, Lewis Mitchell, Aaron J. Schwartz,, Andrew J. Reagan, Christopher M. Danforth, Peter Sheridan Dodds

PDF

3 Repos

TL;DR

Generalized word shift graphs provide an interpretable visualization method for understanding detailed differences between texts across various measures, enhancing analysis beyond simple aggregate scores.

Contribution

The paper introduces generalized word shift graphs, a versatile visualization framework that captures fine-grained textual differences for any measure expressed as a weighted average.

Findings

01

Encompasses common text comparison methods like frequency and entropy measures.

02

Demonstrates application across multiple domains through case studies.

03

Facilitates diagnostic, hypothesis-driven, and interpretative analysis.

Abstract

A common task in computational text analyses is to quantify how two corpora differ according to a measurement like word frequency, sentiment, or information content. However, collapsing the texts' rich stories into a single number is often conceptually perilous, and it is difficult to confidently interpret interesting or unexpected textual patterns without looming concerns about data artifacts or measurement validity. To better capture fine-grained differences between texts, we introduce generalized word shift graphs, visualizations which yield a meaningful and interpretable summary of how individual words contribute to the variation between two texts for any measure that can be formulated as a weighted average. We show that this framework naturally encompasses many of the most commonly used approaches for comparing texts, including relative frequencies, dictionary scores, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.