needLR: Long-read structural variant annotation with population-scale frequency estimation
Jonas A. Gustafson, Jiadong Lin, Evan E. Eichler, Danny E. Miller

TL;DR
needLR is a tool for annotating structural variants from long-read sequencing data, estimating their population frequencies, and aiding in filtering pathogenic variants with high accuracy and efficiency.
Contribution
It introduces needLR, a novel SV annotation tool that integrates population allele frequencies and genomic context for improved filtering of candidate pathogenic variants.
Findings
Assigned allele frequencies to over 97.5% of detected SVs
Reduced the number of novel genic SVs to 121 per case
Retained all known pathogenic variants in evaluation
Abstract
Summary: We present needLR, a structural variant (SV) annotation tool that can be used for filtering and prioritization of candidate pathogenic SVs from long-read sequencing data using population allele frequencies, annotations for genomic context, and gene-phenotype associations. When using population data from 500 presumably healthy individuals to evaluate nine test cases with known pathogenic SVs, needLR assigned allele frequencies to over 97.5% of all detected SVs and reduced the average number of novel genic SVs to 121 per case while retaining all known pathogenic variants. Availability and Implementation: needLR is implemented in bash with dependencies including Truvari v4.2.2, BEDTools v2.31.1, and BCFtools v1.19. Source code, documentation, and pre-computed population allele frequency data are freely available at https://github.com/jgust1/needLR under an MIT license.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Rare Diseases · Genetic Associations and Epidemiology · Biomedical Text Mining and Ontologies
