# Development and Evaluation of SNOMED CT Automated Mapping Tool: Advancing Terminology Standardization and Semantic Interoperability

**Authors:** Youngsun Park, Hannah Kang, Jiwon Kim, Soo-Yong Shin, Dosang Cho, Sang Youl Rhee, Hong Seok Park, Kyung-Jae Lee, Sungchul Bae

PMC · DOI: 10.2196/82670 · 2026-03-09

## TL;DR

A new tool using AI improves the accuracy and efficiency of mapping clinical terms to SNOMED CT, making healthcare data integration easier across institutions.

## Contribution

An LLM-assisted automated tool for SNOMED CT mapping and concept authoring that improves accuracy and reduces manual workload.

## Key findings

- The tool achieved high diagnostic mapping accuracy (up to 98.7%) across four institutions.
- Manual workload was reduced by up to 90%, and new concept authoring errors decreased significantly.
- Implementation led to a 75% reduction in mapping and concept creation time.

## Abstract

Effective secondary use of healthcare data is hindered by fragmentation and a lack of semantic interoperability due to heterogeneous local terminologies. Standardizing clinical terms using SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) is essential but remains a manual, labor-intensive, and inconsistent process, especially across multiple institutions. Automated, scalable solutions are needed to support reliable mapping and new concept authoring for large-scale research.

We aimed to develop a large language model (LLM)-assisted tool that streamlines SNOMED CT terminology mapping and concept authoring, which enables seamless, standardized data integration across multi-institutional clinical datasets.

The mapping pipeline included preprocessing local terms, syntactic and LLM-based vector similarity mapping, and iterative enrichment based on validated results. Translation and semantic representation used GPT-4o (OpenAI). New concepts were authored through a structured postcoordination process, and both the efficiency and quality of authoring (including duplicate rate and Machine Readable Concept Model validation violations) were quantitatively evaluated. Performance was evaluated using diagnostic and surgical procedural terms from 4 major hospital networks (9 university hospitals) in South Korea, with additional usability feedback gathered from clinical terminologists.

Using reference terms, top-5 accuracy for diagnostic mapping reached 98.7%, 89.7%, 98.5%, and 92.8% across the 4 institutions and 99.2%, 82.6%, 98.7%, and 84.7% for surgical procedural mapping. Implementation of the tool reduced manual mapping rates by 30% and overall manual workload by up to 90%. The proposed tool reduced average mapping and new concept creation time by approximately 75%, while decreasing the final mapping table processing time by 90%. New concept authoring errors also decreased, with duplicate concepts reduced by 83% and modeling rule violations by 72%.

This study developed and validated an automated, LLM-assisted SNOMED CT mapping tool that significantly improved efficiency, mapping accuracy, and new concept quality. Limitations include technical integration challenges and dependency on translation quality. Future directions involve leveraging SNOMED CT’s ontology structure and knowledge graphs, enhancing sustainability through ongoing maintenance and quality assurance, and further advancing new concept authoring with automated Machine Readable Concept Model rule enforcement and inactivation processes to achieve robust and scalable terminology standardization.

## Full-text entities

- **Genes:** HSP90B2P (heat shock protein 90 beta family member 2, pseudogene) [NCBI Gene 7190] {aka GRP94P1, GRP94b, HSP, HSPCP2, TRA1P1, TRAP1}, YES1P1 (YES1 pseudogene 1) [NCBI Gene 7526] {aka D22S670, SYR, YES2, YESP}
- **Diseases:** MS (MESH:D009103), CT (MESH:D000088562), KHMC (MESH:C563594), Disorder (MESH:D009358), AI (MESH:C538142), LLM (MESH:D007806), mitral stenosis (MESH:D008946), MRCM (MESH:D004195), lHamartoma of lung (disorder)l (MESH:D008171)
- **Chemicals:** lead (MESH:D007854), DSMC (-), SB (MESH:D000965), S-YS (MESH:D015019), S (MESH:D013455)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** CT — Homo sapiens (Human), Embryonic stem cell (CVCL_9T86)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13010068/full.md

---
Source: https://tomesphere.com/paper/PMC13010068