# Gender bias in machine learning: insights from official labour statistics and textual analysis

**Authors:** Orfeas Menis–Mastromichalakis, George Filandrianos, Maria Symeonaki, Glykeria Stamatopoulou, Dimitris Parsanoglou, Giorgos Stamou

PMC · DOI: 10.1007/s11135-025-02261-0 · Quality & Quantity · 2025-07-08

## TL;DR

This paper explores how machine learning systems can reflect and amplify gender biases in occupational roles, using statistical and textual analysis across three languages.

## Contribution

The study introduces a classification of gender biases in machine learning and reveals discrepancies between official labor statistics and training data.

## Key findings

- Gendered occupational distributions in training data show significant discrepancies compared to official labor statistics.
- Machine learning systems may perpetuate or amplify gender stereotypes in professional contexts.
- An interdisciplinary framework is proposed to address gender bias in digital systems.

## Abstract

The interplay between technology and societal norms often reveals a troubling reality: machine learning systems not only reflect existing gender stereotypes but can also amplify and entrench them, making these biases harder to detect and address. This paper adopts an interdisciplinary approach, combining quantitative and qualitative methods with recent technological advancements, such as machine learning techniques for textual analysis and computational linguistics, to offer a new framework for understanding occupational gender bias in machine learning. The study is motivated by persistent gender inequalities in the labour market and rising concerns about gendered algorithmic bias, as outlined in the European Commission’s Gender Equality Strategy 2020–2025. Focusing on language translation technologies, the research explores how machine learning may perpetuate or amplify gender stereotypes, aiming to foster more inclusive digital systems aligned with EU strategic goals. More specifically, it investigates occupational gender segregation and its manifestations in various forms of gender bias in machine learning across English, French, and Greek. The study introduces a classification of gender biases in machine learning, providing insights into professional areas needing intervention to address gender imbalances and identifying enduring stereotypical representations in textual data. To support this, statistical analysis is conducted to explore gender variations in occupations over the past thirteen years, using official data and international classifications such as the International Standard Classification of Occupations (ISCO-08). Moreover, gendered occupational distributions are extracted from 200,920 text instances in the three languages, revealing significant discrepancies between official labour statistics and the training data.

## Full-text entities

- **Diseases:** ISCO-08 (MESH:D009784), inactivity (MESH:C564765), AI (MESH:C538142), fire (MESH:D000092422), impairments (MESH:D060825), -based violence (MESH:D019292), physical (MESH:D059445), DI (MESH:C566784), hallucinations (MESH:D006212)
- **Chemicals:** metal (MESH:D008670)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12920283/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12920283/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/PMC12920283/full.md

---
Source: https://tomesphere.com/paper/PMC12920283