# O2 supplementation disambiguation in clinical narratives to support retrospective COVID-19 studies

**Authors:** Akhila Abdulnazar, Amila Kugic, Stefan Schulz, Vanessa Stadlbauer, Markus Kreuzthaler

PMC · DOI: 10.1186/s12911-024-02425-2 · BMC Medical Informatics and Decision Making · 2024-01-31

## TL;DR

This paper presents a machine learning approach to identify patients who received oxygen supplementation in clinical narratives, aiding in efficient retrospective analysis of COVID-19 cases.

## Contribution

The study compares classical and deep learning models for classifying oxygen supplementation in discharge summaries and provides model explanations using LIME.

## Key findings

- Classical ML and deep learning models achieved similar classification performance with F-measure between 0.942 and 0.955.
- Classical ML approaches were faster compared to deep learning models.
- LIME explanations revealed relevant features contributing to model decisions.

## Abstract

Oxygen saturation, a key indicator of COVID-19 severity, poses challenges, especially in cases of silent hypoxemia. Electronic health records (EHRs) often contain supplemental oxygen information within clinical narratives. Streamlining patient identification based on oxygen levels is crucial for COVID-19 research, underscoring the need for automated classifiers in discharge summaries to ease the manual review burden on physicians.

We analysed text lines extracted from anonymised COVID-19 patient discharge summaries in German to perform a binary classification task, differentiating patients who received oxygen supplementation and those who did not. Various machine learning (ML) algorithms, including classical ML to deep learning (DL) models, were compared. Classifier decisions were explained using Local Interpretable Model-agnostic Explanations (LIME), which visualize the model decisions.

Classical ML to DL models achieved comparable performance in classification, with an F-measure varying between 0.942 and 0.955, whereas the classical ML approaches were faster. Visualisation of embedding representation of input data reveals notable variations in the encoding patterns between classic and DL encoders. Furthermore, LIME explanations provide insights into the most relevant features at token level that contribute to these observed differences.

Despite a general tendency towards deep learning, these use cases show that classical approaches yield comparable results at lower computational cost. Model prediction explanations using LIME in textual and visual layouts provided a qualitative explanation for the model performance.

The online version contains supplementary material available at 10.1186/s12911-024-02425-2.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** hypoxemia (MESH:D000860), critically ill (MESH:D016638), DL (MESH:D007859), XAI (MESH:C538243), respiratory infection (MESH:D012141), SVC (MESH:D000079426), deaths (MESH:D003643), Coronavirus disease (MESH:D018352), COVID-19 (MESH:D000086382), LSTM (MESH:D000088562)
- **Chemicals:** O2 (MESH:D010100), FiO2 (-)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10829265/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10829265/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC10829265/full.md

---
Source: https://tomesphere.com/paper/PMC10829265