Text-Guided Face Recognition using Multi-Granularity Cross-Modal   Contrastive Learning

Md Mahedi Hasan; Shoaib Meraj Sami; and Nasser Nasrabadi

arXiv:2312.09367·cs.CV·December 18, 2023·1 cites

Text-Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning

Md Mahedi Hasan, Shoaib Meraj Sami, and Nasser Nasrabadi

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel text-guided face recognition framework that leverages natural language facial descriptions and multi-granularity cross-modal contrastive learning to improve recognition accuracy, especially in low-quality surveillance images.

Contribution

It proposes a face-caption alignment module with contrastive losses and a face-caption fusion module for enhanced multimodal feature learning, addressing semantic gaps and textual ambiguities.

Findings

01

Significant performance improvements on low-quality images.

02

Outperforms existing face recognition models and benchmarks.

03

Effective integration of facial attributes via natural language enhances recognition.

Abstract

State-of-the-art face recognition (FR) models often experience a significant performance drop when dealing with facial images in surveillance scenarios where images are in low quality and often corrupted with noise. Leveraging facial characteristics, such as freckles, scars, gender, and ethnicity, becomes highly beneficial in improving FR performance in such scenarios. In this paper, we introduce text-guided face recognition (TGFR) to analyze the impact of integrating facial attributes in the form of natural language descriptions. We hypothesize that adding semantic information into the loop can significantly improve the image understanding capability of an FR algorithm compared to other soft biometrics. However, learning a discriminative joint embedding within the multimodal space poses a considerable challenge due to the semantic gap in the unaligned image-text representations, along…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Text-Guided Face Recognition Using Multi-Granularity Cross-Modal Contrastive Learning· youtube

Taxonomy

TopicsFace recognition and analysis · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques