Unsupervised Topic Discovery in User Comments

Christoph Stanik; Tim Pietz; Walid Maalej

arXiv:2108.08543·cs.SE·August 20, 2021

Unsupervised Topic Discovery in User Comments

Christoph Stanik, Tim Pietz, Walid Maalej

PDF

TL;DR

This paper presents an unsupervised deep learning approach for automatically discovering semantically coherent topics in user comments, aiding stakeholders in extracting valuable insights without manual effort.

Contribution

It introduces a novel deep NLP-based method for unsupervised topic discovery in user comments that requires no parameter tuning and demonstrates high cluster cohesion and meaningfulness.

Findings

01

High inter-coder agreement (up to 98%) in evaluation

02

Effective thematic analysis on telecommunication tweets

03

Robustness of approach without parameter configuration

Abstract

On social media platforms like Twitter, users regularly share their opinions and comments with software vendors and service providers. Popular software products might get thousands of user comments per day. Research has shown that such comments contain valuable information for stakeholders, such as feature ideas, problem reports, or support inquiries. However, it is hard to manually manage and grasp a large amount of user comments, which can be redundant and of a different quality. Consequently, researchers suggested automated approaches to extract valuable comments, e.g., through problem report classifiers. However, these approaches do not aggregate semantically similar comments into specific aspects to provide insights like how often users reported a certain problem. We introduce an approach for automatically discovering topics composed of semantically similar user comments based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.