Practical Design and Benchmarking of Generative AI Applications for   Surgical Billing and Coding

John C. Rollman (1); Bruce Rogers (1); Hamed Zaribafzadeh (1); Daniel; Buckland (2); Ursula Rogers (1); Jennifer Gagnon (1); Ozanan Meireles (1),; Lindsay Jennings (3); Jim Bennett (1); Jennifer Nicholson (3); Nandan Lad; (4); Linda Cendales (1); Andreas Seas (4,5,6); Alessandro Martinino (6); E.; Shelley Hwang (1); Allan D. Kirk (1)

arXiv:2501.05479·cs.CL·January 13, 2025

Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding

John C. Rollman (1), Bruce Rogers (1), Hamed Zaribafzadeh (1), Daniel, Buckland (2), Ursula Rogers (1), Jennifer Gagnon (1), Ozanan Meireles (1),, Lindsay Jennings (3), Jim Bennett (1), Jennifer Nicholson (3), Nandan Lad, (4), Linda Cendales (1), Andreas Seas (4,5,6)

PDF

Open Access

TL;DR

This study develops and benchmarks small, fine-tuned generative AI models for medical billing and coding, demonstrating they can match larger models' accuracy while maintaining privacy and accessibility.

Contribution

Introduces a practical approach for fine-tuning small LLMs for healthcare billing, showing they outperform or match larger models with minimal resources.

Findings

01

Fine-tuned models achieved up to 72% accuracy in ICD-10 coding.

02

The models fabricated less than 1% of codes, indicating high reliability.

03

Small models performed comparably to GPT-4o in accuracy.

Abstract

Background: Healthcare has many manual processes that can benefit from automation and augmentation with Generative Artificial Intelligence (AI), the medical billing and coding process. However, current foundational Large Language Models (LLMs) perform poorly when tasked with generating accurate International Classification of Diseases, 10th edition, Clinical Modification (ICD-10-CM) and Current Procedural Terminology (CPT) codes. Additionally, there are many security and financial challenges in the application of generative AI to healthcare. We present a strategy for developing generative AI tools in healthcare, specifically for medical billing and coding, that balances accuracy, accessibility, and patient privacy. Methods: We fine tune the PHI-3 Mini and PHI-3 Medium LLMs using institutional data and compare the results against the PHI-3 base model, a PHI-3 RAG application, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · WordPiece · Attention Dropout · Adam · Residual Connection · Dropout · Softmax