When Algorithms Infer Gender: Revisiting Computational Phenotyping with Electronic Health Records Data

Jessica Gronsbell; Hilary Thurston; Lillian Dong; Vanessa Ferguson; Diksha Sen Chaudhury; Braden O'Neill; Katrina S. Sha; Rebecca Bonneville

arXiv:2508.14150·cs.CY·August 27, 2025

When Algorithms Infer Gender: Revisiting Computational Phenotyping with Electronic Health Records Data

Jessica Gronsbell, Hilary Thurston, Lillian Dong, Vanessa Ferguson, Diksha Sen Chaudhury, Braden O'Neill, Katrina S. Sha, Rebecca Bonneville

PDF

TL;DR

This paper critically reviews the use of algorithms to infer gender from electronic health records, discussing methodological and ethical challenges, and proposing future research priorities to improve practices and address concerns.

Contribution

It provides a comprehensive review of current gender inference methods in EHRs, highlighting challenges and ethical issues, and suggests directions for future research.

Findings

01

Current practices often rely on diagnosis codes and clinical notes.

02

Methodological and ethical concerns are significant in gender inference.

03

Recommendations for improving research practices are proposed.

Abstract

Computational phenotyping has emerged as a practical solution to the incomplete collection of data on gender in electronic health records (EHRs). This approach relies on algorithms to infer a patient's gender using the available data in their health record, such as diagnosis codes, medication histories, and information in clinical notes. Although intended to improve the visibility of trans and gender-expansive populations in EHR-based biomedical research, computational phenotyping raises significant methodological and ethical concerns related to the potential misuse of algorithm outputs. In this paper, we review current practices for computational phenotyping of gender and examine its challenges through a critical lens. We also highlight existing recommendations for biomedical researchers and propose priorities for future work in this domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.