# Checking assumptions: advancing the analysis of sex and gender in health sciences

**Authors:** Katherine Tombeau Cost, Eva Unternaehrer, Jens C. Pruessner, Alex Abramovich, Kristin Cleverley, Peter Szatmari, Meng-Chuan Lai

PMC · DOI: 10.1186/s13293-025-00803-7 · Biology of Sex Differences · 2026-01-02

## TL;DR

Simplifying sex and gender into binary categories in health research leads to significant statistical errors and biases, and the paper proposes better measurement strategies to improve accuracy and inclusivity.

## Contribution

The paper introduces a new approach to analyzing sex and gender data using continuous or multi-category variables instead of binary tickboxes to reduce statistical problems.

## Key findings

- Binary sex/gender variables increase residual confounding up to 80% and misclassification up to 50%.
- Dichotomizing continuous sex/gender variables introduces bias in model parameters and reduces statistical power by over 50% in some cases.
- Using continuous or multi-category measures improves model validity and captures hidden patterns in health outcomes.

## Abstract

Sex and gender are dissociable constructs, each including multiple components. Based on the analytic problems associated with dichotomising continuous variables, we aimed to synthesize a new approach to collecting and analysing sex and gender data in health research, in contrast to the conventional use of dichotomous tickboxes to code sex/gender.

Using a literature review and data simulations, we examined the magnitude of the statistical and methodological problems associated with the use of a single dichotomised sex/gender variable, including construct validity, predictive validity, measurement error, residual confounding, misclassification and bias due to cut points, power, and representative sampling.

Using the dichotomous sex/gender predictor rather than a continuous sex/gender predictor increased residual confounding up to 80% and misclassification of individual participants up to 50%. Further, there was substantial bias in model parameters when continuous sex/gender variables were dichotomised. Finally, we demonstrate that using the dichotomous sex/gender predictor decreased statistical power, in some cases by more than 50%.

Using a dichotomous sex/gender predictor in place of continuous sex/gender predictors, when applicable, has profound impacts on the modelling and the validity of statistical inferences. Accordingly, we proposed measurement and analytic strategies for new multi-variable data collection and analyses of existing binarized data in relation to sex and gender, to reduce these statistical problems and improve model quality.

The online version contains supplementary material available at 10.1186/s13293-025-00803-7.

In health research, sex and gender are often simplified into binary categories like “male” or “female,” but this overlooks their complexity. Sex refers to a collection of biological traits (like hormones and chromosomes), while gender involves identity, roles, power dynamics, and social context. Treating these constructs as exchangeable or with simple either/or options can misrepresent important differences and weaken research findings. This paper shows how using binary sex/gender categories instead of more precise, continuous measures leads to serious problems. Through simulations, we demonstrate that binary measures of sex or gender cause misclassification, reduce statistical power, bias results, and hide meaningful effects—especially when biological or social traits vary widely within groups. These issues can result in incorrect conclusions about health outcomes. We recommend that researchers move beyond binary tickboxes. Instead, we suggest measuring specific components of sex and gender, like hormone levels or gender expression, using validated tools. As is best practice, we also advocate for asking at least two separate questions: one about sex assigned at birth and one about current gender identity, recognising that gender identity may change across the lifespan. Improved measurement, data collection, and analysis methods can uncover hidden patterns and provide more unambiguous, actionable, and inclusive insights. By capturing the real complexity of sex and gender in data collection, researchers can improve the validity, usefulness, and fairness of biomedical and health research.

The online version contains supplementary material available at 10.1186/s13293-025-00803-7.

Sex and gender are distinct and each has multiple components. Sex involves biological traits; gender involves identity, roles, power dynamics, and social context. Labelling participants as just “male” or “female” is oversimplified and can misrepresent complex realities.Dichotomising leads to poor science. Using binary sex/gender variables reduces precision, power, and validity in statistical models, with up to 80% of variation lost with binary variables and misclassification affecting up to 50% of cases. In models that include interaction terms or may have small effect sizes, bias and reduced power are especially problematic.Better measurement is possible. Using validated tools and continuous or multi-category measures of sex/gender (e.g., hormone levels, gender roles) will support inclusive, precise data to improve discovery and lead to more valid, equitable health research.

Sex and gender are distinct and each has multiple components. Sex involves biological traits; gender involves identity, roles, power dynamics, and social context. Labelling participants as just “male” or “female” is oversimplified and can misrepresent complex realities.

Dichotomising leads to poor science. Using binary sex/gender variables reduces precision, power, and validity in statistical models, with up to 80% of variation lost with binary variables and misclassification affecting up to 50% of cases. In models that include interaction terms or may have small effect sizes, bias and reduced power are especially problematic.

Better measurement is possible. Using validated tools and continuous or multi-category measures of sex/gender (e.g., hormone levels, gender roles) will support inclusive, precise data to improve discovery and lead to more valid, equitable health research.

The online version contains supplementary material available at 10.1186/s13293-025-00803-7.

## Full-text entities

- **Genes:** FOXL2 (forkhead box L2) [NCBI Gene 668] {aka BPES, BPES1, PFRK, PINTO, POF3}, OXT (oxytocin/neurophysin I prepropeptide) [NCBI Gene 5020] {aka OT, OT-NPI, OXT-NPI}, SOX9 (SRY-box transcription factor 9) [NCBI Gene 6662] {aka CMD1, CMPD1, ENH13, SRA1, SRXX2, SRXY10}, SRY (sex determining region Y) [NCBI Gene 6736] {aka SRXX1, SRXY1, TDF, TDY}
- **Diseases:** neurological, psychiatric, and neurodevelopmental conditions (MESH:D001523), health (OMIM:603663), cancer (MESH:D009369), gender dysphoria (MESH:D000068116)
- **Chemicals:** testosterone (MESH:D013739)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Rattus norvegicus (brown rat, species) [taxon 10116], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12866590/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12866590/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/PMC12866590/full.md

---
Source: https://tomesphere.com/paper/PMC12866590