XVir: A Transformer-Based Architecture for Identifying Viral Reads from Cancer Samples
Shorya Consul, John Robertson, Haris Vikalo

TL;DR
XVir is a transformer-based deep learning pipeline designed to accurately detect viral DNA in human tumor samples, addressing challenges posed by viral diversity and improving upon existing methods in accuracy and efficiency.
Contribution
The paper introduces XVir, a novel transformer-based architecture for viral read identification in cancer samples, with improved accuracy and computational efficiency over prior methods.
Findings
XVir achieves high detection accuracy on semi-experimental data.
XVir outperforms state-of-the-art methods in viral detection.
XVir is more compact and less computationally demanding.
Abstract
It is estimated that approximately 15% of cancers worldwide can be linked to viral infections. The viruses that can cause or increase the risk of cancer include human papillomavirus, hepatitis B and C viruses, Epstein-Barr virus, and human immunodeficiency virus, to name a few. The computational analysis of the massive amounts of tumor DNA data, whose collection is enabled by the recent advancements in sequencing technologies, have allowed studies of the potential association between cancers and viral pathogens. However, the high diversity of oncoviral families makes reliable detection of viral DNA difficult and thus, renders such analysis challenging. In this paper, we introduce XVir, a data pipeline that relies on a transformer-based deep learning architecture to reliably identify viral DNA present in human tumors. In particular, XVir is trained on genomic sequencing reads from viral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics · Viral-associated cancers and disorders · Genomics and Phylogenetic Studies
