Automatic Coding at Scale: Design and Deployment of a Nationwide System for Normalizing Referrals in the Chilean Public Healthcare System
Fabi\'an Villena, Mat\'ias Rojas, Felipe Arias, Jorge Pacheco, Paulina, Vera, Jocelyn Dunstan

TL;DR
This paper presents a scalable, automated disease coding system for Chilean public healthcare referrals, combining NER and search engine techniques, achieving near state-of-the-art accuracy and supporting clinical workflows.
Contribution
It introduces a novel two-step system utilizing NER and Elasticsearch for large-scale disease coding in healthcare referrals, with publicly available code for reproducibility.
Findings
Achieved MAP scores of 0.63 at subcategory level and 0.83 at category level.
System performance is close to the best models in existing literature.
Provides a support tool to improve coding efficiency in healthcare.
Abstract
The disease coding task involves assigning a unique identifier from a controlled vocabulary to each disease mentioned in a clinical document. This task is relevant since it allows information extraction from unstructured data to perform, for example, epidemiological studies about the incidence and prevalence of diseases in a determined context. However, the manual coding process is subject to errors as it requires medical personnel to be competent in coding rules and terminology. In addition, this process consumes a lot of time and energy, which could be allocated to more clinically relevant tasks. These difficulties can be addressed by developing computational systems that automatically assign codes to diseases. In this way, we propose a two-step system for automatically coding diseases in referrals from the Chilean public healthcare system. Specifically, our model uses a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Data-Driven Disease Surveillance · Data Quality and Management
