FM-index of Alignment with Gaps
Joong Chae Na, Hyunjoon Kim, Seunghwan Min, Heejin Park, Thierry, Lecroq, Martine Leonard, Laurent Mouchardd, Kunsoo Park

TL;DR
This paper introduces a new FM-index of alignment that efficiently handles gaps in similar strings, enabling better compression and search capabilities for genomic data.
Contribution
It extends the FM-index of alignment to support gaps in string alignments by designing a new suffix array, improving space efficiency and functionality.
Findings
Index size is less than one third of RLCSA.
Supports pattern search and random access with gaps.
Experimental results on genome sequences show improved efficiency.
Abstract
Recently, a compressed index for similar strings, called the FM-index of alignment (FMA), has been proposed with the functionalities of pattern search and random access. The FMA is quite efficient in space requirement and pattern search time, but it is applicable only for an alignment of similar strings without gaps. In this paper we propose the FM-index of alignment with gaps, a realistic index for similar strings, which allows gaps in their alignment. For this, we design a new version of the suffix array of alignment by using alignment transformation and a new definition of the alignment-suffix. The new suffix array of alignment enables us to support the LF-mapping and backward search, the key functionalities of the FM-index, regardless of gap existence in the alignment. We experimentally compared our index with RLCSA due to Makinen et al. on 100 genome sequences from the 1000 Genomes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
