Self-supervised Video-centralised Transformer for Video Face Clustering
Yujiang Wang, Mingzhi Dong, Jie Shen, Yiming Luo, Yiming Lin,, Pingchuan Ma, Stavros Petridis, Maja Pantic

TL;DR
This paper introduces a self-supervised, video-centralised transformer for face clustering in videos, effectively capturing temporal dynamics and outperforming previous methods on standard and egocentric datasets.
Contribution
The paper proposes a novel self-supervised transformer framework for video face clustering, including the first large-scale egocentric dataset, improving clustering accuracy over existing approaches.
Findings
Outperforms state-of-the-art on BBT dataset
Achieves superior results on EasyCom-Clustering dataset
Effectively captures temporal face dynamics
Abstract
This paper presents a novel method for face clustering in videos using a video-centralised transformer. Previous works often employed contrastive learning to learn frame-level representation and used average pooling to aggregate the features along the temporal dimension. This approach may not fully capture the complicated video dynamics. In addition, despite the recent progress in video-based contrastive learning, few have attempted to learn a self-supervised clustering-friendly face representation that benefits the video face clustering task. To overcome these limitations, our method employs a transformer to directly learn video-level representations that can better reflect the temporally-varying property of faces in videos, while we also propose a video-centralised self-supervised framework to train the transformer model. We also investigate face clustering in egocentric videos, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face and Expression Recognition
MethodsContrastive Learning · Average Pooling
