LCPan: efficient variation graph construction using Locally Consistent Parsing
Akmuhammet Ashyralyyev, Z\"ulal Bing\"ol, Beg\"um Filiz \"Oz, Kaiyuan Zhu, Salem Malikic, Uzi Vishkin, S. Cenk Sahinalp, Can Alkan

TL;DR
LCPan introduces an efficient method for constructing variation graphs using Locally Consistent Parsing, significantly reducing memory and computation time compared to existing techniques, and improving genomic data processing.
Contribution
The paper presents the first iterative implementation of Locally Consistent Parsing (LCP) and demonstrates its application in creating faster, more memory-efficient variation graphs in genomics.
Findings
LCP produces fewer cores than sketching techniques.
LCPan constructs variation graphs over 10x faster than vg.
LCPan uses over 13x less memory than existing tools.
Abstract
Efficient and consistent string processing is critical in the exponentially growing genomic data era. Locally Consistent Parsing (LCP) addresses this need by partitioning an input genome string into short, exactly matching substrings (e.g., "cores"), ensuring consistency across partitions. Labeling the cores of an input string consistently not only provides a compact representation of the input but also enables the reapplication of LCP to refine the cores over multiple iterations, providing a progressively longer and more informative set of substrings for downstream analyses. We present the first iterative implementation of LCP with Lcptools and demonstrate its effectiveness in identifying cores with minimal collisions. Experimental results show that the number of cores at the i^th iteration is O(n/c^i) for c ~ 2.34, while the average length and the average distance between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Genome Rearrangement Algorithms
