Unified and Dynamic Graph for Temporal Character Grouping in Long Videos
Xiujun Shu, Wei Wen, Liangsheng Xu, Ruizhi Qiao, Taian Guo, Hanjun Li,, Bei Gan, Xiao Wang, Xing Sun

TL;DR
This paper introduces UniDG, a unified and dynamic graph framework for improved temporal character grouping in videos, leveraging multi-modal representations and adaptive clustering to enhance accuracy and deployment efficiency.
Contribution
The paper proposes a novel unified representation network and dynamic graph clustering method that adaptively constructs affinity graphs for better character grouping in videos.
Findings
Outperforms state-of-the-art methods on MTCG dataset.
Effectively fuses multi-modal features for improved clustering.
Demonstrates generalization on existing datasets.
Abstract
Video temporal character grouping locates appearing moments of major characters within a video according to their identities. To this end, recent works have evolved from unsupervised clustering to graph-based supervised clustering. However, graph methods are built upon the premise of fixed affinity graphs, bringing many inexact connections. Besides, they extract multi-modal features with kinds of models, which are unfriendly to deployment. In this paper, we present a unified and dynamic graph (UniDG) framework for temporal character grouping. This is accomplished firstly by a unified representation network that learns representations of multiple modalities within the same space and still preserves the modality's uniqueness simultaneously. Secondly, we present a dynamic graph clustering where the neighbors of different quantities are dynamically constructed for each node via a cyclic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Human Motion and Animation
