# Development of a natural language-processing application for LGBTQ+ status in mental health records

**Authors:** Margaret Heslin, Jaya Chaturvedi, Anne Marie Bonnici Mallia, Ace Taaca, Diogo Pontes, Charvi Saraswat, Charlotte Woodhead, Katharine A. Rimes, David Chandran, Jyoti Sanyal, Ruimin Ma, Robert Stewart, Angus Roberts

PMC · DOI: 10.1192/bjo.2025.10855 · BJPsych Open · 2025-10-13

## TL;DR

This paper shows how an AI tool can accurately detect LGBTQ+ status in mental health records, enabling better research into mental health inequalities.

## Contribution

A novel NLP application using BERT to identify LGBTQ+ status in unstructured mental health records with high precision and recall.

## Key findings

- The BERT model achieved 95% precision and 93% recall in identifying LGBTQ+ status.
- 14% of the sampled text confirmed LGBTQ+ status, with the rest being negative, irrelevant, or unclear.
- The NLP tool opens new research opportunities on LGBTQ+ mental health disparities.

## Abstract

Lesbian, gay, bisexual, transgender, queer and related community (LGBTQ+) individuals have significantly increased risk for mental health problems. However, research on inequalities in LGBTQ+ mental healthcare is limited because LGBTQ+ status is usually only contained in unstructured, free-text sections of electronic health records.

This study investigated whether natural language processing (NLP), specifically the large language model, Bi-directional Encoder Representations from Transformers (BERT), can identify LGBTQ+ status from this unstructured text in mental health records.

Using electronic health records from a large mental healthcare provider in south London, UK, relevant search terms were identified and a random sample of 10 000 strings extracted. Each string contained 100 characters either side of a search term. A BERT model was trained to classify LGBTQ+ status.

Among 10 000 annotations, 14% (1449) confirmed LGBTQ+ status while 86% (8551) did not. These other categories included LGBTQ+ negative status, irrelevant annotations and unclear cases. The final BERT model, tested on 2000 annotations, achieved a precision of 0.95 (95% CI 0.93–0.98), a recall of 0.93 (95% CI 0.91–0.96) and an F1 score of 0.94 (95% CI 0.92–0.97).

LGBTQ+ status can be determined using this NLP application with a high success rate. The NLP application produced through this work has opened up mental health records to a variety of research questions involving LGBTQ+ status, and should be explored further. Additional work should aim to extend what has been done here by developing an application that can distinguish between different LGBTQ+ groups to examine inequalities between these groups.

## Full-text entities

- **Diseases:** trauma (MESH:D014947), mental health problems (MESH:D000076082), delusion (MESH:D063726)
- **Chemicals:** W014386 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12529341/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12529341/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12529341/full.md

---
Source: https://tomesphere.com/paper/PMC12529341