TL;DR
This paper introduces the concept of overhearing LLM agents that listen to human conversations to assist, demonstrated through a Dungeons & Dragons case study using multimodal audio-language models, and evaluates their helpfulness.
Contribution
It pioneers the overhearing agents paradigm, exploring their potential in D&D gameplay and providing tools for future research in this area.
Findings
Some large audio-language models can perform overhearing tasks using audio cues
Overhearing agents can assist in gameplay without active participation
The study provides human evaluation data on agent helpfulness
Abstract
Much work has been done on conversational LLM agents which directly assist human users with tasks. We present an alternative paradigm for interacting with LLM agents, which we call "overhearing agents". These overhearing agents do not actively participate in conversation -- instead, they "listen in" on human-to-human conversations and perform background tasks or provide suggestions to assist the user. In this work, we explore the overhearing agents paradigm through the lens of Dungeons & Dragons gameplay. We present an in-depth study using large multimodal audio-language models as overhearing agents to assist a Dungeon Master. We perform a human evaluation to examine the helpfulness of such agents and find that some large audio-language models have the emergent ability to perform overhearing agent tasks using implicit audio cues. Finally, we release Python libraries and our project code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
