# SARS-CoV-2 sequencing artifacts associated with targeted PCR enrichment and read mapping

**Authors:** Kirsten Maren Ellegaard, Vithiagaran Gunalan, Raphael Sieber, Sharmin Jamshid Baig, Nicolai Balle Larsen, Marc Bennedbæk, Jonas Bybjerg-Grauholm, Leandro Andrés Escobar-Herrera, Tobias Gress, Theis Hass Thorsen, Anders Krusager, Gitte Nygaard Aasbjerg, Nour Saad Al-Tamimi, Casper Westergaard, Christina Wiid Svarrer, Morten Rasmussen, Marc Stegger, Nihad Al-Rashedi, Nihad Al-Rashedi, Nihad Al-Rashedi, Nihad Al-Rashedi

PMC · DOI: 10.1371/journal.pone.0334009 · PLOS One · 2025-10-16

## TL;DR

This study examines how PCR-based sequencing methods for SARS-CoV-2 can lead to errors and ambiguous results, especially when new virus variants emerge.

## Contribution

The study identifies how primer schemes and reference genome choices affect sequencing accuracy and introduce artifacts during SARS-CoV-2 genome assembly.

## Key findings

- Targeted PCR enrichment can cause recurring ambiguous base calls that accumulate with new virus variants.
- PCR artifacts and amplicon drop-out lead to consistent base calling errors and assembly omissions.
- Misalignments and partial read mappings on reference genomes contribute to ambiguous results and missed mutations.

## Abstract

Protocols and pipelines for SARS-CoV-2 genome sequencing were rapidly established when the COVID-19 outbreak was declared a pandemic. The most widely used approach for sequencing SARS-CoV-2 includes targeted enrichment by PCR, followed by shotgun sequencing and reference-based genome assembly. As the continued surveillance of SARS-CoV-2 worldwide is transitioning towards a lower level of intensity, it is timely to re-visit the sequencing protocols and pipelines established during the acute phase of the pandemic. In the current study, we have investigated the impact of primer scheme and reference genome choice by sequencing samples with multiple primer schemes (Artic V3, V4.1 and V5.3.2) and re-processing reads with multiple reference genomes. We have also analysed the temporal development in ambiguous base calls during the emergence of the BA.2.86.x variant. We found that the primers used for targeted enrichment can result in recurrent ambiguous base calls, which can accumulate rapidly in response to the emergence of a new variant. We also found examples of consistent base calling errors, associated with PCR artifacts and amplicon drop-out. Similarly, misalignments and partially mapped reads on the reference genome resulted in ambiguous base calls, as well as defining mutations being omitted from the assembly. These findings highlight some key limitations of using targeted enrichment by PCR and reference-based genome assembly for sequencing SARS-CoV-2, and the importance of continuously monitoring and updating primer schemes and bioinformatic pipelines.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096), COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382)
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12530606/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12530606/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12530606/full.md

---
Source: https://tomesphere.com/paper/PMC12530606