# Defective but promising: evaluating the utility of currently available bioinformatic pipelines for detecting defective viral genomes in RNA-Seq data

**Authors:** Anthony Taylor, Cristina Rosa, Marco Archetti

PMC · DOI: 10.1099/jgv.0.002176 · The Journal of General Virology · 2025-11-17

## TL;DR

This study evaluates the effectiveness of various bioinformatic tools in detecting defective viral genomes in RNA-Seq data and finds low agreement among the tools.

## Contribution

The study provides a meta-analysis of existing tools for detecting defective viral genomes and suggests using multiple tools for better results.

## Key findings

- There is low agreement among bioinformatic tools in identifying junction points in defective viral genomes.
- The most frequently identified junctions typically correspond to large, disruptive deletions.
- Using multiple programs on the same dataset improves the reliability of detecting defective viral genomes.

## Abstract

Defective viral genomes (DVGs) affect viral dynamics, pathogenicity and evolution, have been found in many in vivo viral infections, and in theory can be detected from sequencing data. We explored the utility of the currently available bioinformatic programs ViReMa, DI-tector, DVGfinder, DG-Seq and VODKA2 for identifying junction points in plant virus high-throughput sequencing data, looking at whether the outputs from these bioinformatic tools generally agree and exploring the possibility of using these tools to help us understand whether DVGs are consistently generated and maintained in a specific virus-host combination. We conducted a meta-analysis of eight previously published RNA sequencing datasets utilizing all five programs and compared the degree of output overlap, the most common junctions present in each output and whether these junctions match previously reported junctions for that virus. Our results demonstrate a low degree of agreement regarding identified junctions between programs, including the most frequently identified one, although the most frequently identified junctions typically corresponded to large, disruptive deletions. We found preliminary support for our prevalence hypothesis, although we ultimately conclude that a more robust dataset generated expressly for testing this hypothesis will be required for a convincing answer. Finally, we suggest that when using bioinformatic programs to search for DVGs, it is best to run the same dataset through multiple programs and look at the overlap to inform decisions on downstream characterization.

## Full-text entities

- **Diseases:** viral infections (MESH:D014777)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12622790/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12622790/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12622790/full.md

---
Source: https://tomesphere.com/paper/PMC12622790