Seeking Information with RAG-Assistants: Does Model Size Matter in Human-AI Collaborations?
Lennard C. Froma, Tom Kouwenhoven, Maaike H.T. de Boer, Catholijn M. Jonker, and Max J. van Duijn

TL;DR
This study evaluates RAG-based chatbots of different sizes in realistic human-AI collaboration scenarios, emphasizing usability and satisfaction alongside performance, and highlights the importance of real-world testing beyond benchmarks.
Contribution
It provides empirical insights into how model size affects human-AI collaboration, usability, and satisfaction in practical multi-turn information-seeking tasks.
Findings
Human-AI collaboration improves performance regardless of model size.
Perceived usability and satisfaction show little variation across different model sizes.
Hybrid RAG systems are beneficial in real-world information-seeking scenarios.
Abstract
Much research on LLMs has focused on increasing benchmark performance. However, the evaluation of such models in real-world collaborative human-AI workflows has stayed behind. This work evaluates a chatbot-style assistant based on Retrieval-Augmented Generation (RAG) in a realistic multi-turn information-seeking scenario inspired by workplace settings where compliance with local legislation and secure handling of sensitive data are often key. Specifically, we examine the performance of humans (N=112) assisted by RAG-assistants compared to LLM-only or LLM+RAG baselines. In this setting, we investigate how underlying model size (3B, 8B, and 70B) shapes the human-AI collaborative dynamic and how it influences perceived usability and satisfaction. Results show that the performance gain of human-AI collaboration over the model-only baselines is significant, irrespective of model size,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
