Topic Modeling on User Stories using Word Mover's Distance

Kim Julian G\"ulle; Nicholas Ford; Patrick Ebel; Florian Brokhausen,; Andreas Vogelsang

arXiv:2007.05302·cs.CL·July 14, 2020

Topic Modeling on User Stories using Word Mover's Distance

Kim Julian G\"ulle, Nicholas Ford, Patrick Ebel, Florian Brokhausen,, Andreas Vogelsang

PDF

1 Repo

TL;DR

This paper compares three topic modeling approaches for analyzing large, crowd-generated user stories, finding that combining word embeddings with Word Mover's Distance yields promising results for clustering and gaining insights.

Contribution

It introduces and evaluates a novel approach combining word embeddings with Word Mover's Distance for topic modeling of user stories, outperforming traditional methods.

Findings

01

Word embeddings with Word Mover's Distance effectively cluster user stories.

02

The approach uncovers potential new categories in user feedback.

03

No objective measure currently exists to evaluate clustering quality.

Abstract

Requirements elicitation has recently been complemented with crowd-based techniques, which continuously involve large, heterogeneous groups of users who express their feedback through a variety of media. Crowd-based elicitation has great potential for engaging with (potential) users early on but also results in large sets of raw and unstructured feedback. Consolidating and analyzing this feedback is a key challenge for turning it into sensible user requirements. In this paper, we focus on topic modeling as a means to identify topics within a large set of crowd-generated user stories and compare three approaches: (1) a traditional approach based on Latent Dirichlet Allocation, (2) a combination of word embeddings and principal component analysis, and (3) a combination of word embeddings and Word Mover's Distance. We evaluate the approaches on a publicly available set of 2,966 user…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

firstdayofjune/aire-20
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.