When Gender is Hard to See: Multi-Attribute Support for Long-Range Recognition
Nzakiese Mbongo, Kailash A. Hambarde, Hugo Proen\c{c}a

TL;DR
This paper introduces a dual-path transformer framework leveraging CLIP for robust gender recognition in long-range imagery, combining visual cues and attribute prompts to improve accuracy under challenging conditions.
Contribution
It presents a novel multi-attribute, CLIP-based dual-path model and a new large-scale dataset for long-range gender recognition, outperforming existing methods.
Findings
Surpasses state-of-the-art in long-range gender recognition
Robust to distance, angle, and occlusion variations
Provides interpretable attribute localization
Abstract
Accurate gender recognition from extreme long-range imagery remains a challenging problem due to limited spatial resolution, viewpoint variability, and loss of facial cues. For such purpose, we present a dual-path transformer framework that leverages CLIP to jointly model visual and attribute-driven cues for gender recognition at a distance. The framework integrates two complementary streams: (1) a direct visual path that refines a pre-trained CLIP image encoder through selective fine-tuning of its upper layers, and (2) an attribute-mediated path that infers gender from a set of soft-biometric prompts (e.g., hairstyle, clothing, accessories) aligned in the CLIP text-image space. Spatial channel attention modules further enhance discriminative localization under occlusion and low resolution. To support large-scale evaluation, we construct U-DetAGReID, a unified long-range gender dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Domain Adaptation and Few-Shot Learning · Biometric Identification and Security
