The Impact of Annotation Guidelines and Annotated Data on Extracting App Features from App Reviews
Faiz Ali Shah, Kairit Sirts, Dietmar Pfahl

TL;DR
This paper investigates how annotation guidelines and data sources affect the quality of app feature extraction from reviews, proposing guideline improvements and analyzing training data requirements for better model performance.
Contribution
It introduces revised annotation guidelines that produce more informative features and evaluates the impact of different training data sources, including cross-category reviews and product reviews.
Findings
Simulated guideline changes yield less noisy, more useful features.
Training on reviews from the test app improves recall.
Augmenting training data with product reviews increases recall but reduces precision.
Abstract
Annotation guidelines used to guide the annotation of training and evaluation datasets can have a considerable impact on the quality of machine learning models. In this study, we explore the effects of annotation guidelines on the quality of app feature extraction models. As a main result, we propose several changes to the existing annotation guidelines with a goal of making the extracted app features more useful and informative to the app developers. We test the proposed changes via simulating the application of the new annotation guidelines and then evaluating the performance of the supervised machine learning models trained on datasets annotated with initial and simulated guidelines. While the overall performance of automatic app feature extraction remains the same as compared to the model trained on the dataset with initial annotations, the features extracted by the model trained on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Mobile and Web Applications · Web Data Mining and Analysis
