Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models
Fernando Ortega, Ra\'ul Lara-Cabrera, Jorge Due\~nas-Ler\'in, Alejandro de la Torre-Luque, Merc\'e Salvador Robert, Enrique Baca-Garc\'ia

TL;DR
This study explores automating psychiatric diagnosis coding from free-text descriptions using NLP and ML, highlighting the superior performance of transformer-based LLMs like e5_large.
Contribution
It evaluates various text representation methods, demonstrating that fine-tuned LLMs significantly improve ICD classification accuracy in psychiatric texts.
Findings
Transformer embeddings outperform traditional models.
e5_large achieved an F1_micro score of 0.866.
Adapting LLMs to clinical language addresses long-tail and ambiguity issues.
Abstract
Mental health has become a global priority, leading to a massive administrative burden in the coding of clinical diagnoses. This study proposes the automation of psychiatric diagnostic analysis by mapping free-text descriptions to the International Classification of Diseases (ICD) using Natural Language Processing (NLP) and Machine Learning (ML) techniques. Utilizing a specialized dataset of 145,513 Spanish psychiatric descriptions, various text representation paradigms were evaluated, ranging from classical frequency-based models (BoW, TF-IDF) to state-of-the-art Large Language Models (LLMs) such as e5\_large, BioLORD, and Llama-3-8B. Results indicate that transformer-based embeddings consistently outperform traditional methods by capturing implicit semantic cues and nuanced medical terminology. The e5\_large model, through end-to-end fine-tuning, achieved the highest performance with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
