Visual Exploration of Stopword Probabilities in Topic Models
Shuangjiang Xue, Pierre Le Bras, David A. Robb, Mike J. Chantler and, Stefano Padilla

TL;DR
This paper introduces a probabilistic method and interactive visualization for analyzing stopword likelihood in topic models, enhancing model credibility and user confidence through tailored stopword management.
Contribution
It presents a novel corpus-specific stopword probability estimation method and an interactive visualization system to improve stopword analysis in topic modeling.
Findings
Increases user confidence in topic models
Provides a more representative stopword list extension
Allows adjustable thresholds for stopword analysis
Abstract
Stopword removal is a critical stage in many Machine Learning methods but often receives little consideration, it interferes with the model visualizations and disrupts user confidence. Inappropriately chosen or hastily omitted stopwords not only lead to suboptimal performance but also significantly affect the quality of models, thus reducing the willingness of practitioners and stakeholders to rely on the output visualizations. This paper proposes a novel extraction method that provides a corpus-specific probabilistic estimation of stopword likelihood and an interactive visualization system to support their analysis. We evaluated our approach and interface using real-world data, a commonly used Machine Learning method (Topic Modelling), and a comprehensive qualitative experiment probing user confidence. The results of our work show that our system increases user confidence in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Advanced Text Analysis Techniques · Complex Network Analysis Techniques
