# Long COVID incidence across SARS-CoV-2 lineages and identification of conserved spike targets for multivalent vaccines

**Authors:** Grace Jaeyoon Kim, Md Ashad Alam, Judy S. Crabtree, Rebecca Rose, Susanna L. Lamers, San Chu, Ronald Horswell, Daniel Fort, Lucio Miele

PMC · DOI: 10.1017/cts.2025.10226 · Journal of Clinical and Translational Science · 2025-12-19

## TL;DR

This study links SARS-CoV-2 viral sequences to Long COVID cases and identifies conserved regions in the Spike protein that could be used for multivalent vaccines.

## Contribution

The study identifies lineage-specific Long COVID incidence and conserved Spike protein regions for potential vaccine targets.

## Key findings

- Long COVID incidence varied significantly by SARS-CoV-2 lineage, from 14% in AY.13 to 67.8% in B.1.1.7.
- Eight conserved amino acid regions in the Spike protein were identified as potential vaccine targets.

## Abstract

Long COVID remains poorly characterized at the genomic level. The primary aim of this study was to examine the relationship between viral sequences and the incidence of Long COVID at a tertiary care center in Louisiana between April 2020 and December 2022. A secondary aim was analysis of the Spike protein to identify conserved regions for multivalent vaccine targets.

To estimate Long COVID incidence across variants, we linked 4789 SARS-CoV-2 sequences to 3090 de-identified patient electronic health record information. The base population was defined as any patient with an International Classification of Diseases-10-Clinical Modification COVID-19 diagnosis code (U07.1) based definitions of Long COVID presentation developed by the N3C consortium.

1,554 patients (1,536 Long COVID-negative) met Long COVID definitions, with 56.3% being female, 36.1% self-reported as African American, 5.5% self-reported as Hispanic/Latino, and 54.5% had received at least one vaccine dose 14 days prior to SARS-CoV-2 collection. Long COVID-positive patients were older (mean age 43.1 years) than negative patients (35.9 years; p = 0.0054) and were more likely to be female (p = 0.0001). Among unvaccinated patients, those with Long COVID were significantly younger than their vaccinated counterparts (p < 0.00001). Long COVID incidence varied by PANGO lineage, ranging between 14% in AY.13 to 67.8% in B.1.1.7. Analysis of spike protein diversity revealed eight conserved amino acid regions (Shannon entropy < 0.43), representing potential targets for vaccine design.

Long COVID rates across thousands of annotated SARS-CoV-2 sequences revealed lineage-specific risk and conserved epitopes for future interventions.

## Linked entities

- **Proteins:** CHMP5 (charged multivesicular body protein 5)
- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382), Long COVID (MESH:D000094024)
- **Species:** Homo sapiens (human, species) [taxon 9606], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12780800/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12780800/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12780800/full.md

---
Source: https://tomesphere.com/paper/PMC12780800