Indexing Finite Language Representation of Population Genotypes
Jouni Sir\'en, Niko V\"alim\"aki, Veli M\"akinen

TL;DR
This paper introduces an indexing method for population genotypes using a finite automaton and Burrows-Wheeler transform, enabling efficient alignment and variation analysis across complete genomes.
Contribution
It presents a novel index combining finite automaton and BWT to handle recombinant genome sequences efficiently, facilitating population-wide genomic studies.
Findings
Index can recognize all plausible recombinant sequences.
Approximately 1.0% of matches were to novel recombinants with exact matching.
Up to 2.4% of matches were to novel recombinants with approximate matching.
Abstract
With the recent advances in DNA sequencing, it is now possible to have complete genomes of individuals sequenced and assembled. This rich and focused genotype information can be used to do different population-wide studies, now first time directly on whole genome level. We propose a way to index population genotype information together with the complete genome sequence, so that one can use the index to efficiently align a given sequence to the genome with all plausible genotype recombinations taken into account. This is achieved through converting a multiple alignment of individual genomes into a finite automaton recognizing all strings that can be read from the alignment by switching the sequence at any time. The finite automaton is indexed with an extension of Burrows-Wheeler transform to allow pattern search inside the plausible recombinant sequences. The size of the index stays…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
