Relation-Aware Distribution Representation Network for Person Clustering   with Multiple Modalities

Kaijian Liu; Shixiang Tang; Ziyue Li; Zhishuai Li; Lei Bai; Feng Zhu,; Rui Zhao

arXiv:2308.00588·cs.CV·August 2, 2023

Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities

Kaijian Liu, Shixiang Tang, Ziyue Li, Zhishuai Li, Lei Bai, Feng Zhu,, Rui Zhao

PDF

Open Access

TL;DR

This paper introduces RAD-Net, a relation-aware distribution representation network that effectively captures multi-modal clues for person clustering, outperforming existing methods by leveraging relation-based features across modalities.

Contribution

The paper proposes a novel distribution representation for multi-modal clues that is modality agnostic, using a graph-based construction and cyclic refinement, improving clustering performance.

Findings

01

Achieved +6% F-score on VPCD dataset.

02

Achieved +8.2% F-score on VoxCeleb2 dataset.

03

Outperformed traditional multi-view clustering methods.

Abstract

Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks, such as movie parsing and identity-based movie editing. Related methods such as multi-view clustering mainly project multi-modal features into a joint feature space. However, multi-modal clue features are usually rather weakly correlated due to the semantic gap from the modality-specific uniqueness. As a result, these methods are not suitable for person clustering. In this paper, we propose a Relation-Aware Distribution representation Network (RAD-Net) to generate a distribution representation for multi-modal clues. The distribution representation of a clue is a vector consisting of the relation between this clue and all other clues from all modalities, thus being modality agnostic and good for person clustering. Accordingly, we introduce a graph-based method to construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Video Analysis and Summarization · Human Pose and Action Recognition