A Semi-Automated Annotation Workflow for Paediatric Histopathology Reports Using Small Language Models
Avish Vijayaraghavan, Jaskaran Singh Kawatra, Sebin Sabu, Jonny Sheldon, Will Poulett, Alex Eze, Daniel Key, John Booth, Shiren Patel, Jonny Pearson, Dan Schofield, Jonathan Hope, Pavithra Rajendran, Neil Sebire

TL;DR
This study presents a resource-efficient semi-automated workflow using small language models to extract structured data from paediatric histopathology reports, achieving high accuracy with minimal clinician involvement.
Contribution
The paper introduces a novel, low-resource annotation workflow leveraging small language models for clinical report information extraction, with demonstrated effectiveness in paediatric renal biopsy data.
Findings
Gemma 2 2B achieved 84.3% accuracy in extraction.
Entity guidelines improved performance by 7-19%.
Few-shot examples increased accuracy by 6-38%.
Abstract
Electronic Patient Record (EPR) systems contain valuable clinical information, but much of it is trapped in unstructured text, limiting its use for research and decision-making. Large language models can extract such information but require substantial computational resources to run locally, and sending sensitive clinical data to cloud-based services, even when deidentified, raises significant patient privacy concerns. In this study, we develop a resource-efficient semi-automated annotation workflow using small language models (SLMs) to extract structured information from unstructured EPR data, focusing on paediatric histopathology reports. As a proof-of-concept, we apply the workflow to paediatric renal biopsy reports, a domain chosen for its constrained diagnostic scope and well-defined underlying biology. We develop the workflow iteratively with clinical oversight across three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
