TL;DR
This paper introduces CLUE, an LLM-powered interview tool that captures real-time user opinions on various large language models, providing valuable insights into user preferences and perceptions.
Contribution
The paper presents CLUE, a novel LLM-powered interview system that automatically gathers user opinions immediately after interactions with LLMs, enabling large-scale user experience analysis.
Findings
CLUE effectively captures diverse user opinions.
Users show bipolar views on reasoning processes.
Demand for information freshness and multi-modality is evident.
Abstract
Which large language model (LLM) is better? Every evaluation tells a story, but what do users really think about current LLMs? This paper presents CLUE, an LLM-powered interviewer that conducts in-the-moment user experience interviews, right after users interact with LLMs, and automatically gathers insights about user opinions from massive interview logs. We conduct a study with thousands of users to understand user opinions on mainstream LLMs, recruiting users to first chat with a target LLM and then be interviewed by CLUE. Our experiments demonstrate that CLUE captures interesting user opinions, e.g., the bipolar views on the displayed reasoning process of DeepSeek-R1 and demands for information freshness and multi-modality. Our code and data are at https://github.com/cxcscmu/LLM-Interviewer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
