Temporal Visual Semantics-Induced Human Motion Understanding with Large Language Models
Zheng Xing, Weibing Zhao

TL;DR
This paper introduces a novel approach that combines temporal vision semantics derived from large language models with subspace clustering to improve unsupervised human motion segmentation, achieving superior results on benchmark datasets.
Contribution
It proposes a new method to incorporate temporal semantic information from LLMs into subspace clustering for human motion understanding, which was not explored before.
Findings
Outperforms state-of-the-art methods on four datasets
Effective integration of LLM-derived semantics improves segmentation accuracy
Feedback mechanism enhances subspace embedding optimization
Abstract
Unsupervised human motion segmentation (HMS) can be effectively achieved using subspace clustering techniques. However, traditional methods overlook the role of temporal semantic exploration in HMS. This paper explores the use of temporal vision semantics (TVS) derived from human motion sequences, leveraging the image-to-text capabilities of a large language model (LLM) to enhance subspace clustering performance. The core idea is to extract textual motion information from consecutive frames via LLM and incorporate this learned information into the subspace clustering framework. The primary challenge lies in learning TVS from human motion sequences using LLM and integrating this information into subspace clustering. To address this, we determine whether consecutive frames depict the same motion by querying the LLM and subsequently learn temporal neighboring information based on its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications
