Scholastic: Graphical Human-Al Collaboration for Inductive and Interpretive Text Analysis
Matt-Heun Hong, Lauren A. Marsh, Jessica L. Feuston, Janet Ruppert,, Jed R. Brubaker, Danielle Albers Szafir

TL;DR
Scholastic introduces a human-centered, interactive system that combines machine learning and visualization to support interpretive text analysis, addressing scholars' concerns about algorithmic disruption.
Contribution
It presents a novel human-in-the-loop clustering approach with visualizations to aid inductive and interpretive research on large text corpora.
Findings
Supports iterative coding and refinement by scholars
Enhances document sampling through interactive visualizations
Facilitates meaningful theme discovery in large datasets
Abstract
Interpretive scholars generate knowledge from text corpora by manually sampling documents, applying codes, and refining and collating codes into categories until meaningful themes emerge. Given a large corpus, machine learning could help scale this data sampling and analysis, but prior research shows that experts are generally concerned about algorithms potentially disrupting or driving interpretive scholarship. We take a human-centered design approach to addressing concerns around machine-assisted interpretive research to build Scholastic, which incorporates a machine-in-the-loop clustering algorithm to scaffold interpretive text analysis. As a scholar applies codes to documents and refines them, the resulting coding schema serves as structured metadata which constrains hierarchical document and word clusters inferred from the corpus. Interactive visualizations of these clusters can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
