How Effectively Do LLMs Extract Feature-Sentiment Pairs from App Reviews?
Faiz Ali Shah, Ahmed Sabir, Rajesh Sharma, and Dietmar Pfahl

TL;DR
This paper evaluates the effectiveness of various Large Language Models in extracting feature-specific sentiment pairs from app reviews, comparing their performance across different few-shot learning scenarios.
Contribution
It provides a comprehensive comparison of LLMs like GPT-4, ChatGPT, and Llama-2 variants for feature-sentiment extraction in app reviews, highlighting their strengths and limitations.
Findings
GPT-4 outperforms rule-based methods in feature extraction by 17% in zero-shot.
Fine-tuned RE-BERT surpasses GPT-4 by 6% in feature extraction.
GPT-4 achieves 76% and 45% F1-scores for positive and neutral sentiment prediction in zero-shot.
Abstract
Automatic analysis of user reviews to understand user sentiments toward app functionality (i.e. app features) helps align development efforts with user expectations and needs. Recent advances in Large Language Models (LLMs) such as ChatGPT have shown impressive performance on several new tasks without updating the model's parameters i.e. using zero or a few labeled examples, but the capabilities of LLMs are yet unexplored for feature-specific sentiment analysis. The goal of our study is to explore the capabilities of LLMs to perform feature-specific sentiment analysis of user reviews. This study compares the performance of state-of-the-art LLMs, including GPT-4, ChatGPT, and different variants of Llama-2 chat, against previous approaches for extracting app features and associated sentiments in zero-shot, 1-shot, and 5-shot scenarios. The results indicate that GPT-4 outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining
MethodsALIGN · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Layer Normalization · Dropout · Attention Is All You Need · Position-Wise Feed-Forward Layer · Residual Connection
