# Automating thematic review of prevention of future deaths reports: concordance study of a child-suicide analysis using large language models

**Authors:** Sam Osian, Arpan Dutta, Sahil Bhandari, Iain E Buchan, Dan W Joyce

PMC · DOI: 10.1136/bmjment-2025-302212 · BMJ Mental Health · 2026-02-25

## TL;DR

This study shows that a large language model can automate the analysis of child-suicide prevention reports with high accuracy, saving time and uncovering missed cases.

## Contribution

The novel use of a vision-enabled LLM pipeline to automate and improve thematic analysis of PFD reports with high concordance to expert judgment.

## Key findings

- The Toolkit identified 73 child-suicide reports, compared to 37 found in a prior manual review.
- Post-consensus agreement between the Toolkit and clinical experts was substantial to almost perfect (Cohen’s κ=0.93).
- The entire automated analysis of all reports took less than 6 minutes on a consumer-grade laptop.

## Abstract

Prevention of future deaths (PFD) reports issued by coroners in England and Wales identify systemic safety hazards but are difficult to analyse at scale. Reports are not machine-readable, lack consistent metadata and cannot be reliably searched or exported, meaning prior national reviews have relied on labour-intensive manual screening and coding.

To evaluate whether a fully automated, vision-enabled large language model (LLM) pipeline (PFD Toolkit) can replicate and extend the Office for National Statistics (ONS) thematic review of child-suicide PFD reports, and to assess concordance with blinded clinical adjudication.

All PFD reports published between July 2013 and November 2023 (n=4730) were scraped from judiciary.uk and processed using PFD Toolkit, which combines optical character recognition with LLM-powered screening and thematic coding. Reports were classified for child suicide (≤18 years), addressee categories and 23 coroner-concern subthemes mirroring the ONS coding frame. Agreement was evaluated against a blinded clinical reference standard: three psychiatrists independently adjudicated a stratified sample of 146 reports (73 Toolkit-positive cases and 73 decoys), with disagreements resolved by consensus. Inter-rater reliability and index-reference agreement were quantified using kappa statistics.

The Toolkit identified 73 child-suicide PFD reports between July 2013 and November 2023, compared with 37 identified in the ONS review. 62 cases fell within the ONS analytical window, and 11 pre-dated the introduction of suicide-related tags on the PFD archive. Pre-consensus inter-rater reliability among clinicians was substantial to almost perfect (Fleiss’ κ=0.75, 95% CI 0.65 to 0.84). Post-consensus agreement between the Toolkit and the clinical reference standard was substantial to almost perfect (Cohen’s κ=0.93, 95% CI 0.77 to 1.00; raw agreement 97%). End-to-end screening, coding and tabulation of all reports completed in 5 min 29 s on a consumer-grade laptop.

A national thematic review of child-suicide PFD reports can be fully automated with high concordance to expert judgement, dramatically reducing time and labour while recovering previously missed cases.

Automated analysis of PFD reports enables rapid, reproducible surveillance of recurring system failures, supporting more timely public health intelligence, policy responses and learning from coronial data.

## Full-text entities

- **Diseases:** Coroner (MESH:C537369), Mental health related death (OMIM:603663), child death (MESH:D003643), LLM (MESH:D007806)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12970072/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12970072/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC12970072/full.md

---
Source: https://tomesphere.com/paper/PMC12970072