CLIP-HandID: Vision-Language Model for Hand-Based Person Identification

Nathanael L. Baisa; Babu Pallam; Amudhavel Jayavel

arXiv:2506.12447·cs.CV·July 31, 2025

CLIP-HandID: Vision-Language Model for Hand-Based Person Identification

Nathanael L. Baisa, Babu Pallam, Amudhavel Jayavel

PDF

TL;DR

This paper presents CLIP-HandID, a novel vision-language approach that uses CLIP and pseudo-tokens to improve hand-based person identification, especially useful in criminal investigations with limited evidence.

Contribution

It introduces a new method leveraging CLIP and pseudo-tokens for discriminative hand image features, enhancing identification accuracy over existing methods.

Findings

01

Significantly outperforms existing approaches on large hand datasets.

02

Effectively leverages multi-modal reasoning for better generalization.

03

Demonstrates robustness across multi-ethnic hand images.

Abstract

This paper introduces a novel approach to person identification using hand images, designed specifically for criminal investigations. The method is particularly valuable in serious crimes such as sexual abuse, where hand images are often the only identifiable evidence available. Our proposed method, CLIP-HandID, leverages a pre-trained foundational vision-language model - CLIP - to efficiently learn discriminative deep feature representations from hand images (input to CLIP's image encoder) using textual prompts as semantic guidance. Since hand images are labeled with indexes rather than text descriptions, we employ a textual inversion network to learn pseudo-tokens that encode specific visual contexts or appearance attributes. These learned pseudo-tokens are then incorporated into textual prompts, which are fed into CLIP's text encoder to leverage its multi-modal reasoning and enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.