Evaluating Software User Feedback Classifiers on Unseen Apps, Datasets, and Metadata
Peter Devine, Yun Sing Koh, Kelly Blincoe

TL;DR
This study evaluates the generalization of machine learning classifiers for software user feedback across unseen apps, datasets, and metadata, revealing limited transferability and the potential of multi-dataset training.
Contribution
It provides an empirical assessment of classifier performance on unseen data and explores the impact of metadata and multi-dataset training on classification accuracy.
Findings
Classifiers perform similarly on unseen apps compared to seen apps.
Classifiers struggle with unseen datasets from different platforms or labels.
Multi-dataset training and zero-shot approaches can improve generalization.
Abstract
Listening to user's requirements is crucial to building and maintaining high quality software. Online software user feedback has been shown to contain large amounts of information useful to requirements engineering (RE). Previous studies have created machine learning classifiers for parsing this feedback for development insight. While these classifiers report generally good performance when evaluated on a test set, questions remain as to how well they extend to unseen data in various forms. This study evaluates machine learning classifiers performance on feedback for two common classification tasks (classifying bug reports and feature requests). Using seven datasets from prior research studies, we investigate the performance of classifiers when evaluated on feedback from different apps than those contained in the training set and when evaluated on completely different datasets (coming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software System Performance and Reliability
