# Pattern recognition in SARS cases: insights from t-SNE and k-means clustering applied to COVID-19 symptomatology

**Authors:** Julliana Gonçalves Marques, Bruno Motta de Carvalho, Luiz Affonso Guedes, Márjory Da Costa-Abreu

PMC · DOI: 10.3389/frai.2025.1536486 · Frontiers in Artificial Intelligence · 2025-03-27

## TL;DR

This study uses t-SNE and k-means clustering to explore symptom patterns in SARS cases, revealing similarities between unspecified SARS and COVID-19.

## Contribution

The novel contribution is applying t-SNE with Gower's distance and k-means to analyze symptom patterns in SARS cases for diagnostic insights.

## Key findings

- Unspecified SARS and COVID-19 cases show similar symptom patterns.
- Clustering revealed shared characteristics among grouped individuals.
- Case progression and diagnosis influence identified symptom patterns.

## Abstract

Despite the end of the SARS-CoV-2 pandemic, the medical field continues to address several lasting effects, the most notable being long COVID. However, COVID-19 presents another specific challenge that complicates diagnosis: the similarity of its symptoms with those of other viral diseases, particularly among various SARS strains. This overlap makes it difficult to identify distinct and meaningful symptom patterns as they develop. This study proposes a dimensionality reduction approach combined with a clustering technique to visually analyse structural similarities among SARS-infected individuals, aiming to determine whether aspects such as case progression and diagnosis impact these patterns.

This analysis utilised the t-Distributed Stochastic Neighbour Embedding (t-SNE) algorithm for dimensionality reduction, combined with Gower's distance to handle categorical data, and k-means clustering. The study focused on symptoms, case progression, and diagnoses of SARS-CoV-2 and unspecified SARS cases using data from the Brazilian SARS dataset for São Paulo State during 2020 and 2021. The process began with a visual analysis aimed at identifying structural patterns in the symptom data, highlighting potential similarities between COVID-19 patients and those diagnosed with unspecified SARS. Following this, an intra-cluster analysis was performed to investigate the common features that defined each cluster, providing insights into shared characteristics among grouped individuals.

The analysis revealed that both diagnoses share substantial similarities, particularly in the presence or absence of COVID-19-related symptoms, even when the majority of individuals were diagnosed with unspecified SARS.

The analysis is crucial, as Brazil was one of the countries most severely affected by the pandemic, experiencing profound impacts across multiple dimensions.

## Linked entities

- **Diseases:** SARS (MONDO:0005091), COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** viral diseases (MESH:D014777), COVID-19 (MESH:D000086382), SARS (MESH:D045169), long COVID (MESH:D000094024)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11983552/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11983552/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/PMC11983552/full.md

---
Source: https://tomesphere.com/paper/PMC11983552