TL;DR
This study empirically investigates how LLM agents, when used as daily assistants with planning and execution capabilities, influence user trust and team performance across various tasks, highlighting conditions for effective collaboration.
Contribution
It provides the first comprehensive empirical analysis of user trust and team performance with LLM agents in daily tasks using a plan-then-execute approach.
Findings
High-quality planning and user involvement improve performance.
Users tend to mistrust plausible but incorrect plans.
Effective calibration of trust enhances collaboration outcomes.
Abstract
Since the explosion in popularity of ChatGPT, large language models (LLMs) have continued to impact our everyday lives. Equipped with external tools that are designed for a specific purpose (e.g., for flight booking or an alarm clock), LLM agents exercise an increasing capability to assist humans in their daily work. Although LLM agents have shown a promising blueprint as daily assistants, there is a limited understanding of how they can provide daily assistance based on planning and sequential decision making capabilities. We draw inspiration from recent work that has highlighted the value of 'LLM-modulo' setups in conjunction with humans-in-the-loop for planning tasks. We conducted an empirical study (N = 248) of LLM agents as daily assistants in six commonly occurring tasks with different levels of risk typically associated with them (e.g., flight ticket booking and credit card…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
