Seeking Information with RAG-Assistants: Does Model Size Matter in Human-AI Collaborations?

Lennard C. Froma; Tom Kouwenhoven; Maaike H.T. de Boer; Catholijn M. Jonker; and Max J. van Duijn

arXiv:2605.00964·cs.IR·May 5, 2026

Seeking Information with RAG-Assistants: Does Model Size Matter in Human-AI Collaborations?

Lennard C. Froma, Tom Kouwenhoven, Maaike H.T. de Boer, Catholijn M. Jonker, and Max J. van Duijn

PDF

TL;DR

This study evaluates RAG-based chatbots of different sizes in realistic human-AI collaboration scenarios, emphasizing usability and satisfaction alongside performance, and highlights the importance of real-world testing beyond benchmarks.

Contribution

It provides empirical insights into how model size affects human-AI collaboration, usability, and satisfaction in practical multi-turn information-seeking tasks.

Findings

01

Human-AI collaboration improves performance regardless of model size.

02

Perceived usability and satisfaction show little variation across different model sizes.

03

Hybrid RAG systems are beneficial in real-world information-seeking scenarios.

Abstract

Much research on LLMs has focused on increasing benchmark performance. However, the evaluation of such models in real-world collaborative human-AI workflows has stayed behind. This work evaluates a chatbot-style assistant based on Retrieval-Augmented Generation (RAG) in a realistic multi-turn information-seeking scenario inspired by workplace settings where compliance with local legislation and secure handling of sensitive data are often key. Specifically, we examine the performance of humans (N=112) assisted by RAG-assistants compared to LLM-only or LLM+RAG baselines. In this setting, we investigate how underlying model size (3B, 8B, and 70B) shapes the human-AI collaborative dynamic and how it influences perceived usability and satisfaction. Results show that the performance gain of human-AI collaboration over the model-only baselines is significant, irrespective of model size,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.