Clustering Millions of Faces by Identity
Charles Otto, Dayong Wang, Anil K. Jain

TL;DR
This paper introduces a scalable Rank-Order clustering algorithm capable of grouping hundreds of millions of face images into millions of identities, outperforming traditional methods in accuracy and efficiency, with applications in social media and law enforcement.
Contribution
The paper presents a novel scalable clustering algorithm specifically designed for large-scale face datasets, achieving higher accuracy and efficiency than existing methods like k-means and spectral clustering.
Findings
Clustered up to 123 million faces into over 10 million identities.
Achieved an F-measure of 0.87 on the LFW dataset.
Successfully clustered video frames with an F-measure of 0.71.
Abstract
In this work, we attempt to address the following problem: Given a large number of unlabeled face images, cluster them into the individual identities present in this data. We consider this a relevant problem in different application scenarios ranging from social media to law enforcement. In large-scale scenarios the number of faces in the collection can be of the order of hundreds of million, while the number of clusters can range from a few thousand to millions--leading to difficulties in terms of both run-time complexity and evaluating clustering and per-cluster quality. An efficient and effective Rank-Order clustering algorithm is developed to achieve the desired scalability, and better clustering accuracy than other well-known algorithms such as k-means and spectral clustering. We cluster up to 123 million face images into over 10 million clusters, and analyze the results in terms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face and Expression Recognition · Biometric Identification and Security
