Validation of a Zero-Shot Learning Natural Language Processing Tool for Data Abstraction from Unstructured Healthcare Data
Basil Kaufmann, Dallin Busby, Chandan Krushna Das, Neeraja Tillu, Mani, Menon, Ashutosh K. Tewari, Michael A. Gorin

TL;DR
This study developed and validated a zero-shot learning NLP tool based on GPT-3.5 that efficiently abstracts data from unstructured healthcare documents with accuracy comparable to human experts, offering significant time savings and broad applicability.
Contribution
The paper introduces a zero-shot learning NLP tool that requires no task-specific training, demonstrating its effectiveness in healthcare data abstraction and potential for wide-ranging applications.
Findings
The NLP tool significantly reduces data abstraction time compared to humans.
The tool achieves non-inferior accuracy to human abstractors for structured reports.
It performs well even on scanned documents with OCR processing.
Abstract
Objectives: To describe the development and validation of a zero-shot learning natural language processing (NLP) tool for abstracting data from unstructured text contained within PDF documents, such as those found within electronic health records. Materials and Methods: A data abstraction tool based on the GPT-3.5 model from OpenAI was developed and compared to three physician human abstractors in terms of time to task completion and accuracy for abstracting data on 14 unique variables from a set of 199 de-identified radical prostatectomy pathology reports. The reports were processed by the software tool in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction. The tool was assessed for superiority for data abstraction speed and non-inferiority for accuracy. Results: The human abstractors required a mean of 101s per report for data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Attention Dropout · Residual Connection · Cosine Annealing · Weight Decay
