Enhancing Human-Centered Dynamic Scene Understanding via Multiple LLMs Collaborated Reasoning
Hang Zhang, Wenxiao Zhang, Haoxuan Qu, Jun Liu

TL;DR
This paper introduces V-HOI MLCR, a framework that enhances video-based human-object interaction detection by leveraging multiple large language models for improved reasoning, validated through accuracy improvements on existing models.
Contribution
The paper proposes a novel multi-LLM collaboration framework with a two-stage reasoning process and auxiliary training to improve V-HOI detection performance.
Findings
Improved prediction accuracy of V-HOI models through multi-LLM reasoning.
Effective two-stage collaboration system for LLMs in scene understanding.
Enhanced discriminative ability via CLIP integration.
Abstract
Human-centered dynamic scene understanding plays a pivotal role in enhancing the capability of robotic and autonomous systems, in which Video-based Human-Object Interaction (V-HOI) detection is a crucial task in semantic scene understanding, aimed at comprehensively understanding HOI relationships within a video to benefit the behavioral decisions of mobile robots and autonomous driving systems. Although previous V-HOI detection models have made significant strides in accurate detection on specific datasets, they still lack the general reasoning ability like human beings to effectively induce HOI relationships. In this study, we propose V-HOI Multi-LLMs Collaborated Reasoning (V-HOI MLCR), a novel framework consisting of a series of plug-and-play modules that could facilitate the performance of current V-HOI detection models by leveraging the strong reasoning ability of different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Semantic Web and Ontologies · Constraint Satisfaction and Optimization
MethodsBalanced Selection · Contrastive Language-Image Pre-training
