# LcDel: deletion variation detection based on clustering and long reads

**Authors:** Yanan Yu, Runtian Gao, Junwei Luo

PMC · DOI: 10.3389/fgene.2024.1404415 · Frontiers in Genetics · 2024-05-10

## TL;DR

LcDel is a new method for detecting deletion variations in genomes using clustering and long reads, improving accuracy over existing tools.

## Contribution

LcDel introduces a two-step clustering approach that enhances deletion detection by addressing chimeric variants.

## Key findings

- LcDel outperforms existing tools in deletion detection performance on multiple datasets.
- The method uses a combination of sliding window-based and coverage-based clustering for improved accuracy.
- Hierarchical clustering is applied to determine deletion location and length precisely.

## Abstract

Motivation: Genomic structural variation refers to chromosomal level variations such as genome rearrangement or insertion/deletion, which typically involve larger DNA fragments compared to single nucleotide variations. Deletion is a common type of structural variants in the genome, which may lead to mangy diseases, so the detection of deletions can help to gain insights into the pathogenesis of diseases and provide accurate information for disease diagnosis, treatment, and prevention. Many tools exist for deletion variant detection, but they are still inadequate in some aspects, and most of them ignore the presence of chimeric variants in clustering, resulting in less precise clustering results.

Results: In this paper, we present LcDel, which can detect deletion variation based on clustering and long reads. LcDel first finds the candidate deletion sites and then performs the first clustering step using two clustering methods (sliding window-based and coverage-based, respectively) based on the length of the deletion. After that, LcDel immediately uses the second clustering by hierarchical clustering to determine the location and length of the deletion. LcDel is benchmarked against some other structural variation detection tools on multiple datasets, and the results show that LcDel has better detection performance for deletion. The source code is available in https://github.com/cyq1314woaini/LcDel.

## Full-text entities

- **Diseases:** mangy diseases (MESH:D004194)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11116628/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11116628/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/PMC11116628/full.md

---
Source: https://tomesphere.com/paper/PMC11116628