ConSensus: Multi-Agent Collaboration for Multimodal Sensing
Hyungjun Yoon, Mohammad Malekzadeh, Sung-Ju Lee, Fahim Kawsar, Lorena Qendro

TL;DR
ConSensus introduces a multi-agent framework that decomposes multimodal sensing tasks into specialized agents, improving accuracy and robustness in sensor data interpretation while reducing computational costs.
Contribution
It presents a training-free multi-agent collaboration framework with a hybrid fusion mechanism for effective multimodal sensor data interpretation.
Findings
Achieves 7.1% average accuracy improvement over single-agent baseline.
Matches or exceeds iterative debate methods in performance.
Reduces fusion token cost by 12.7 times with single-round hybrid fusion.
Abstract
Large language models (LLMs) are increasingly grounded in sensor data to perceive and reason about human physiology and the physical world. However, accurately interpreting heterogeneous multimodal sensor data remains a fundamental challenge. We show that a single monolithic LLM often fails to reason coherently across modalities, leading to incomplete interpretations and prior-knowledge bias. We introduce ConSensus, a training-free multi-agent collaboration framework that decomposes multimodal sensing tasks into specialized, modality-aware agents. To aggregate agent-level interpretations, we propose a hybrid fusion mechanism that balances semantic aggregation, which enables cross-modal reasoning and contextual understanding, with statistical consensus, which provides robustness through agreement across modalities. While each approach has complementary failure modes, their combination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Emotion and Mood Recognition
