# Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ

**Authors:** Jason S. Kessler

arXiv: 1703.00565 · 2017-04-24

## TL;DR

Scattertext is an open source, browser-based visualization tool that effectively displays linguistic differences between document categories using scatterplots, supporting large-scale and query-based analyses.

## Contribution

It introduces a novel visualization method that handles thousands of terms clearly and supports multiple comparison modes in a language-independent manner.

## Key findings

- Visualizes thousands of terms clearly in scatterplots
- Supports query-based comparison of term usage
- Enables comparison of feature importance scores

## Abstract

Scattertext is an open source tool for visualizing linguistic variation between document categories in a language-independent way. The tool presents a scatterplot, where each axis corresponds to the rank-frequency a term occurs in a category of documents. Through a tie-breaking strategy, the tool is able to display thousands of visible term-representing points and find space to legibly label hundreds of them. Scattertext also lends itself to a query-based visualization of how the use of terms with similar embeddings differs between document categories, as well as a visualization for comparing the importance scores of bag-of-words features to univariate metrics.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.00565/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1703.00565/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/1703.00565/full.md

---
Source: https://tomesphere.com/paper/1703.00565