TL;DR
This paper compares three topic modeling approaches for analyzing large, crowd-generated user stories, finding that combining word embeddings with Word Mover's Distance yields promising results for clustering and gaining insights.
Contribution
It introduces and evaluates a novel approach combining word embeddings with Word Mover's Distance for topic modeling of user stories, outperforming traditional methods.
Findings
Word embeddings with Word Mover's Distance effectively cluster user stories.
The approach uncovers potential new categories in user feedback.
No objective measure currently exists to evaluate clustering quality.
Abstract
Requirements elicitation has recently been complemented with crowd-based techniques, which continuously involve large, heterogeneous groups of users who express their feedback through a variety of media. Crowd-based elicitation has great potential for engaging with (potential) users early on but also results in large sets of raw and unstructured feedback. Consolidating and analyzing this feedback is a key challenge for turning it into sensible user requirements. In this paper, we focus on topic modeling as a means to identify topics within a large set of crowd-generated user stories and compare three approaches: (1) a traditional approach based on Latent Dirichlet Allocation, (2) a combination of word embeddings and principal component analysis, and (3) a combination of word embeddings and Word Mover's Distance. We evaluate the approaches on a publicly available set of 2,966 user…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
