Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users
Nishant Balepur, Malachi Hamada, Varsha Kishore, Sergey Feldman, Amanpreet Singh, Pao Siangliulue, Joseph Chee Chang, Eunsol Choi, Jordan Lee Boyd-Graber, Aakanksha Naik

TL;DR
This paper introduces MyScholarQA, a personalized deep research system that infers user interests, proposes tailored actions, and generates multi-section reports, emphasizing the importance of real user feedback for genuine personalization.
Contribution
The paper presents MyScholarQA, a novel personalized research assistant that combines synthetic benchmarking with real user studies to uncover nuanced personalization errors.
Findings
MySQA outperforms baselines in citation metrics and personalized actions in synthetic tests.
Real user interviews reveal nine nuanced personalization errors undetectable by LLM judges.
Qualitative feedback provides lessons for future personalized research system design.
Abstract
Deep Research (DR) systems help researchers cope with ballooning publishing counts. Such tools synthesize scientific papers to answer research queries, but lack understanding of their users. We address this with MyScholarQA (MySQA), a personalized DR agent that: 1) infers a profile with a user's research interests; 2) proposes personalized actions for a user's input query; and 3) writes a multi-section report for the query that follows user-approved actions. We first test MySQA with NLP's standard protocol: we build a benchmark with synthetic users and LLM judges, where MySQA beats baselines in citation metrics and personalized action-following. However, we suspect this process does not cover all aspects of personalized DR users value, so we interview users in an online version of MySQA to unmask them. We reveal nine nuanced errors of personalized DR undetectable by our LLM judges, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
