# Extended Grammar of Systematized Nomenclature of Medicine – Clinical Terms for Semantic Representation of Clinical Data: Methodological Study

**Authors:** Christophe Gaudet-Blavignac, Julien Ehrsam, Monika Baumann, Adel Bensahla, Mirjam Mattei, Yuanyuan Zheng, Christian Lovis

PMC · DOI: 10.2196/80314 · 2026-01-28

## TL;DR

This paper proposes extending the grammar of SNOMED CT to better represent clinical data, improving semantic interoperability and capturing complex clinical nuances.

## Contribution

A framework for extending SNOMED CT's grammar to address semantic gaps and support richer clinical data representation.

## Key findings

- Extending SNOMED CT's grammar enabled the representation of over 119,000 distinct clinical data elements.
- The approach successfully addressed limitations like negation, uncertainty, and integration of external vocabularies.
- The method offers a flexible alternative to creating new standards for semantic interoperability.

## Abstract

Interoperability has been a challenge for half a century. Led by an informatics view of the world, the quest for interoperability has evolved from typing and categorizing data to building increasingly complex models. In parallel with the development of these models, the field of terminologies and ontologies emerged to refine granularity and introduce notions of hierarchy. Clinical data models and terminology systems vary in purpose, and their fixed categories shape and constrain representation, which inevitably leads to information loss.

Despite these efforts, semantic interoperability remains imperfect. Achieving it is essential for effective data reuse but requires more than rich terminologies and standardized models. This methodological study explores the extent to which the SNOMED CT (Systematized Nomenclature of Medicine – Clinical Terms) compositional grammar can be leveraged and extended to approximate a formal descriptive grammar, allowing clinical reality to be expressed in coherent, meaningful sentences rather than preconstrained categories.

Building on a decade of semantic representation efforts at the Geneva University Hospitals, we developed a framework to identify recurring semantic gaps in clinical data. We addressed these gaps by systematically modifying the SNOMED CT Machine Read` Concept Model and extending its Augmented Backus-Naur Form syntax to support necessary grammatical structures and external vocabularies.

This approach enabled the semantic representation of over 119,000 distinct data elements covering 13 billion instances. By extending the grammar, we successfully addressed critical limitations such as negation, scalar values, uncertainty, temporality, and the integration of external terminologies like Pango. The extensions proved essential for capturing complex clinical nuances that standard precoordinated concepts could not represent.

Rather than creating a new standard from scratch, extending the grammatical capabilities of SNOMED CT offers a viable pathway toward high-fidelity semantic representation. This work serves as a proof-of-concept that separating the rules of composition from vocabulary allows for a more flexible and robust description of clinical reality, provided that challenges regarding governance and machine readability are addressed.

## Full-text entities

- **Diseases:** MRCM (MESH:D004195), disorder (MESH:D009358), Disorder of lung (MESH:D008171), Pneumonie (MESH:C535937), DL (MESH:C537032), SNOMED CT (MESH:D000088562), SCT (MESH:C535780), OMOP (MESH:D011248), CL (MESH:D002971), hernia (MESH:D006547), COVID-19 (MESH:D000086382), CTCAE (MESH:D064420)
- **Chemicals:** FDG (-)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Homo sapiens (human, species) [taxon 9606]

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12895155/full.md

---
Source: https://tomesphere.com/paper/PMC12895155