Self-supervised Video-centralised Transformer for Video Face Clustering

Yujiang Wang; Mingzhi Dong; Jie Shen; Yiming Luo; Yiming Lin,; Pingchuan Ma; Stavros Petridis; Maja Pantic

arXiv:2203.13166·cs.CV·February 16, 2023·1 cites

Self-supervised Video-centralised Transformer for Video Face Clustering

Yujiang Wang, Mingzhi Dong, Jie Shen, Yiming Luo, Yiming Lin,, Pingchuan Ma, Stavros Petridis, Maja Pantic

PDF

Open Access

TL;DR

This paper introduces a self-supervised, video-centralised transformer for face clustering in videos, effectively capturing temporal dynamics and outperforming previous methods on standard and egocentric datasets.

Contribution

The paper proposes a novel self-supervised transformer framework for video face clustering, including the first large-scale egocentric dataset, improving clustering accuracy over existing approaches.

Findings

01

Outperforms state-of-the-art on BBT dataset

02

Achieves superior results on EasyCom-Clustering dataset

03

Effectively captures temporal face dynamics

Abstract

This paper presents a novel method for face clustering in videos using a video-centralised transformer. Previous works often employed contrastive learning to learn frame-level representation and used average pooling to aggregate the features along the temporal dimension. This approach may not fully capture the complicated video dynamics. In addition, despite the recent progress in video-based contrastive learning, few have attempted to learn a self-supervised clustering-friendly face representation that benefits the video face clustering task. To overcome these limitations, our method employs a transformer to directly learn video-level representations that can better reflect the temporally-varying property of faces in videos, while we also propose a video-centralised self-supervised framework to train the transformer model. We also investigate face clustering in egocentric videos, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Face and Expression Recognition

MethodsContrastive Learning · Average Pooling