User Intent Recognition and Satisfaction with Large Language Models: A User Study with ChatGPT
Anna Bodonhelyi, Efe Bozkir, Shuo Yang, Enkelejda Kasneci, Gjergji, Kasneci

TL;DR
This study evaluates user intent recognition and satisfaction in large language models, comparing GPT-3.5 Turbo and GPT-4 Turbo, revealing strengths and weaknesses in intent detection and user satisfaction across different prompt reformulations.
Contribution
It introduces a fine-grained intent taxonomy and analyzes how different GPT models recognize intents and impact user satisfaction, highlighting areas for improvement.
Findings
GPT-4 outperforms GPT-3.5 in recognizing common intents.
GPT-3.5 better recognizes less frequent intents.
Users are more satisfied with GPT-4's reformulations when intents are correctly recognized.
Abstract
The rapid evolution of LLMs represents an impactful paradigm shift in digital interaction and content engagement. While they encode vast amounts of human-generated knowledge and excel in processing diverse data types, they often face the challenge of accurately responding to specific user intents, leading to user dissatisfaction. Based on a fine-grained intent taxonomy and intent-based prompt reformulations, we analyze the quality of intent recognition and user satisfaction with answers from intent-based prompt reformulations of GPT-3.5 Turbo and GPT-4 Turbo models. Our study highlights the importance of human-AI interaction and underscores the need for interdisciplinary approaches to improve conversational AI systems. We show that GPT-4 outperforms GPT-3.5 in recognizing common intents but is often outperformed by GPT-3.5 in recognizing less frequent intents. Moreover, whenever the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
