# When algorithms infer gender: revisiting computational phenotyping with electronic health records data

**Authors:** Jessica Gronsbell, Hilary Thurston, Lillian Dong, Vanessa Ferguson, Diksha Sen Chaudhury, Braden O’Neill, Katrina S. Sha, Rebecca Bonneville

PMC · DOI: 10.1186/s13293-025-00783-8 · Biology of Sex Differences · 2025-12-31

## TL;DR

This paper reviews how algorithms infer gender from health records, highlighting ethical and methodological issues in representing trans and gender-expansive individuals.

## Contribution

The paper critically examines computational phenotyping of gender and proposes priorities for ethical and inclusive biomedical research.

## Key findings

- Computational phenotyping introduces data quality and bias issues in representing gender.
- Current methods risk reinforcing assumptions about gender and may lead to misuse.
- Future work should prioritize inclusive and fluid approaches to gender measurement in health data.

## Abstract

Computational phenotyping has emerged as a practical solution to the incomplete collection of data on gender in electronic health records (EHRs). This approach relies on algorithms to infer a patient’s gender using the available data in their health record, such as diagnosis codes, medication histories, and information in clinical notes. Although intended to improve the visibility of trans and gender-expansive populations in EHR-based biomedical research, computational phenotyping raises significant methodological and ethical concerns related to the potential misuse of algorithm outputs. In this paper, we provide a narrative review of computational phenotyping of gender and examine its challenges through a critical lens. We also highlight existing recommendations for biomedical researchers and propose priorities for future work in this domain.

The online version contains supplementary material available at 10.1186/s13293-025-00783-8.

Sex and gender are inconsistently recorded in electronic health records (EHRs), limiting the scope of biomedical research using these data.Computational phenotyping algorithms attempt to fill these gaps by inferring gender-related information from patients’ historical health data.While these approaches aim to improve the visibility of trans and gender expansive people in biomedical research, they also introduce important methodological and ethical concerns, including (1) data quality issues, (2) underlying assumptions about gender, (3) bias in algorithm design and validation, and (4) potential for misuse.Future research should focus on building just and conceptually sound foundations for gender-based inquiry, such as creating and using measurement tools that accommodate fluidity, center lived experience rather than biological proxies, and allow for individualized data collection without defaulting to gender assignment.

Sex and gender are inconsistently recorded in electronic health records (EHRs), limiting the scope of biomedical research using these data.

Computational phenotyping algorithms attempt to fill these gaps by inferring gender-related information from patients’ historical health data.

While these approaches aim to improve the visibility of trans and gender expansive people in biomedical research, they also introduce important methodological and ethical concerns, including (1) data quality issues, (2) underlying assumptions about gender, (3) bias in algorithm design and validation, and (4) potential for misuse.

Future research should focus on building just and conceptually sound foundations for gender-based inquiry, such as creating and using measurement tools that accommodate fluidity, center lived experience rather than biological proxies, and allow for individualized data collection without defaulting to gender assignment.

The online version contains supplementary material available at 10.1186/s13293-025-00783-8.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12865949/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12865949/full.md

## References

11 references — full list in the complete paper: https://tomesphere.com/paper/PMC12865949/full.md

---
Source: https://tomesphere.com/paper/PMC12865949