PanDelos-plus: A parallel algorithm for computing sequence homology in pangenomic analysis
Simone Colli, Emiliano Maresi, Vincenzo Bonnici

TL;DR
PanDelos-plus is a parallel, alignment-free tool that significantly accelerates and reduces memory usage in bacterial pangenome analysis, enabling large-scale comparative genomics on standard hardware.
Contribution
It introduces a fully parallel, scalable redesign of PanDelos that improves speed and memory efficiency while maintaining accuracy.
Findings
Up to 14x faster execution on synthetic datasets
Memory usage reduced by up to 96%
Maintains accuracy comparable to state-of-the-art methods
Abstract
The identification of homologous gene families across multiple genomes is a central task in bacterial pangenomics traditionally requiring computationally demanding all-against-all comparisons. PanDelos addresses this challenge with an alignment-free and parameter-free approach based on k-mer profiles, combining high speed, ease of use, and competitive accuracy with state-of-the-art methods. However, the increasing availability of genomic data requires tools that can scale efficiently to larger datasets. To address this need, we present PanDelos-plus, a fully parallel, gene-centric redesign of PanDelos. The algorithm parallelizes the most computationally intensive phases (Best Hit detection and Bidirectional Best Hit extraction) through data decomposition and a thread pool strategy, while employing lightweight data structures to reduce memory usage. Benchmarks on synthetic datasets show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
