# From One Domain to Another: The Pitfalls of Gender Recognition in Unseen Environments

**Authors:** Nzakiese Mbongo, Kailash A. Hambarde, Hugo Proença

PMC · DOI: 10.3390/s25134161 · Sensors (Basel, Switzerland) · 2025-07-04

## TL;DR

This paper shows that gender recognition models perform well in familiar settings but struggle when applied to new environments, highlighting the need for better cross-domain robustness.

## Contribution

The paper introduces the first comprehensive cross-domain evaluation of gender recognition models and proposes a new robustness metric called URM.

## Key findings

- Most models show a performance drop of up to 16.4 percentage points under domain shift.
- ALM achieved above 80% F1 in most transfer scenarios with strong inductive biases.
- The Unified Robustness Metric (URM) was introduced to quantify cross-domain robustness.

## Abstract

Gender recognition from pedestrian imagery is acknowledged by many as a quasi-solved problem, yet most existing approaches evaluate performance in a within-domain setting, i.e., when the test and training data, though disjoint, closely resemble each other. This work provides the first exhaustive cross-domain assessment of six architectures considered to represent the state of the art: ALM, VAC, Rethinking, LML, YinYang-Net, and MAMBA, across three widely known benchmarks: PA-100K, PETA, and RAP. All train/test combinations between datasets were evaluated, yielding 54 comparable experiments. The results revealed a performance split: median in-domain F1 approached 90% in most models, while the average drop under domain shift was up to 16.4 percentage points, with the most recent approaches degrading the most. The adaptive-masking ALM achieved an F1 above 80% in most transfer scenarios, particularly those involving high-resolution or pose-stable domains, highlighting the importance of strong inductive biases over architectural novelty alone. Further, to characterize robustness quantitatively, we introduced the Unified Robustness Metric (URM), which integrates the average cross-domain degradation performance into a single score. A qualitative saliency analysis also corroborated the numerical findings by exposing over-confidence and contextual bias in misclassifications. Overall, this study suggests that challenges in gender recognition are much more evident in cross-domain settings than under the commonly reported within-domain context. Finally, we formalize an open evaluation protocol that can serve as a baseline for future works of this kind.

## Full-text entities

- **Diseases:** injury to (MESH:D014947)
- **Chemicals:** RAP (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12252448/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12252448/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/PMC12252448/full.md

---
Source: https://tomesphere.com/paper/PMC12252448