DRACULA: Hunting for the Actions Users Want Deep Research Agents to Execute
Nishant Balepur, Malachi Hamada, Varsha Kishore, Sergey Feldman, Amanpreet Singh, Pao Siangliulue, Joseph Chee Chang, Rachel Rudinger, Eunsol Choi, Jordan Lee Boyd-Graber, Doug Downey, Aakanksha Naik

TL;DR
DRACULA introduces a dataset capturing user feedback on intermediate actions in scientific research agents, enabling analysis and prediction of user-preferred actions to improve report generation.
Contribution
It provides the first dataset with user feedback on intermediate actions, and studies how well LLMs can predict user preferences to guide action selection.
Findings
LLMs improve in predicting user actions when using full selection history.
User preferences vary based on unstated goals, affecting action prediction.
Simulation-based interventions can generate actions users prefer in follow-up tasks.
Abstract
Scientific Deep Research (DR) agents answer user queries by synthesizing research papers into multi-section reports. User feedback can improve their utility, but existing protocols only score the final report, making it hard to study and learn which intermediate actions DR agents should take to improve reports. We collect DRACULA, the first dataset with user feedback on intermediate actions for DR. Over five weeks, nineteen expert CS researchers ask queries to a DR system that proposes actions (e.g., "Add a section on datasets"). Our users select actions they prefer, then judge whether an output report applied their selections successfully, yielding 8,103 action preferences and 5,230 execution judgments. After confirming a DR agent can execute DRACULA's actions, we study the predictability of user-preferred actions via simulation-how well LLMs predict the actions users select-a step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
