Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities
Kaijian Liu, Shixiang Tang, Ziyue Li, Zhishuai Li, Lei Bai, Feng Zhu,, Rui Zhao

TL;DR
This paper introduces RAD-Net, a relation-aware distribution representation network that effectively captures multi-modal clues for person clustering, outperforming existing methods by leveraging relation-based features across modalities.
Contribution
The paper proposes a novel distribution representation for multi-modal clues that is modality agnostic, using a graph-based construction and cyclic refinement, improving clustering performance.
Findings
Achieved +6% F-score on VPCD dataset.
Achieved +8.2% F-score on VoxCeleb2 dataset.
Outperformed traditional multi-view clustering methods.
Abstract
Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks, such as movie parsing and identity-based movie editing. Related methods such as multi-view clustering mainly project multi-modal features into a joint feature space. However, multi-modal clue features are usually rather weakly correlated due to the semantic gap from the modality-specific uniqueness. As a result, these methods are not suitable for person clustering. In this paper, we propose a Relation-Aware Distribution representation Network (RAD-Net) to generate a distribution representation for multi-modal clues. The distribution representation of a clue is a vector consisting of the relation between this clue and all other clues from all modalities, thus being modality agnostic and good for person clustering. Accordingly, we introduce a graph-based method to construct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Video Analysis and Summarization · Human Pose and Action Recognition
