Visual Exploration of Stopword Probabilities in Topic Models

Shuangjiang Xue; Pierre Le Bras; David A. Robb; Mike J. Chantler and; Stefano Padilla

arXiv:2501.10137·cs.HC·January 20, 2025

Visual Exploration of Stopword Probabilities in Topic Models

Shuangjiang Xue, Pierre Le Bras, David A. Robb, Mike J. Chantler and, Stefano Padilla

PDF

Open Access

TL;DR

This paper introduces a probabilistic method and interactive visualization for analyzing stopword likelihood in topic models, enhancing model credibility and user confidence through tailored stopword management.

Contribution

It presents a novel corpus-specific stopword probability estimation method and an interactive visualization system to improve stopword analysis in topic modeling.

Findings

01

Increases user confidence in topic models

02

Provides a more representative stopword list extension

03

Allows adjustable thresholds for stopword analysis

Abstract

Stopword removal is a critical stage in many Machine Learning methods but often receives little consideration, it interferes with the model visualizations and disrupts user confidence. Inappropriately chosen or hastily omitted stopwords not only lead to suboptimal performance but also significantly affect the quality of models, thus reducing the willingness of practitioners and stakeholders to rely on the output visualizations. This paper proposes a novel extraction method that provides a corpus-specific probabilistic estimation of stopword likelihood and an interactive visualization system to support their analysis. We evaluated our approach and interface using real-world data, a commonly used Machine Learning method (Topic Modelling), and a comprehensive qualitative experiment probing user confidence. The results of our work show that our system increases user confidence in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Advanced Text Analysis Techniques · Complex Network Analysis Techniques