TeamLLM: Exploring the Capabilities of LLMs for Multimodal Group Interaction Prediction

Diana Romero; Xin Gao; Daniel Khalkhali; Salma Elmalaki

arXiv:2604.08771·cs.HC·April 13, 2026

TeamLLM: Exploring the Capabilities of LLMs for Multimodal Group Interaction Prediction

Diana Romero, Xin Gao, Daniel Khalkhali, Salma Elmalaki

PDF

TL;DR

This paper explores the use of Large Language Models for predicting group coordination in collaborative Mixed Reality environments using multimodal sensor data, demonstrating significant performance improvements and identifying their limitations.

Contribution

It introduces a hierarchical encoding of multimodal sensor data as natural language and evaluates LLM adaptation methods for group behavior prediction, establishing new benchmarks and guidelines.

Findings

01

LLMs outperform LSTM baselines by 3.2× in linguistically-grounded behavior prediction.

02

Fine-tuning achieves 96% accuracy in conversation prediction with sub-35ms latency.

03

Text-based LLMs succeed in turn-taking prediction but struggle with spatial and visual attention tasks.

Abstract

Predicting group behavior, how individuals coordinate, communicate, and interact during collaborative tasks, is essential for designing systems that can support team performance through real-time prediction and realistic simulation of collaborative scenarios. Large Language Models (LLMs) have shown promise for processing sensor data for human-activity recognition (HAR), yet their capabilities for team dynamics or group-level multimodal sensing remain unexplored. This paper investigates whether LLMs can predict group coordination patterns from multimodal sensor data in collaborative Mixed Reality (MR) environments. We encode hierarchical context -- individual behavioral profiles, group structural properties, and temporal activity context -- as natural language and evaluate three LLM adaptation paradigms (zero-shot, few-shot, and supervised fine-tuning) against statistical baselines. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.