PyamilySeq: A Python Tool for Interpretable Gene (Re)Clustering and Pangenomic Inference Across Species and Genera
Nicholas J. Dimonaco

TL;DR
PyamilySeq is a Python tool that enables interpretable gene clustering and pangenomic inference across species and genera, supporting iterative analysis and integration of new sequences with comprehensive output options.
Contribution
It introduces a novel framework for integrating new sequences into existing clusters and supports both species and genus level pangenomic analyses.
Findings
Supports iterative analysis with new sequences
Enables cross-genera gene family detection
Provides detailed gene presence-absence matrices
Abstract
PyamilySeq is a Python-based tool designed for interpretable gene clustering and pangenomic inference, supporting analyses at both species and genus levels. It facilitates the clustering of gene sequences into families based on sequence similarity using CD-HIT, and can take the output of tried-and-tested sequence clustering tools such as CD-HIT, BLAST, DIAMOND, and MMseqs2. PyamilySeq is distinctive in its ability to integrate new sequences into existing clusters, providing a robust framework for iterative analysis while preserving the original clusters, useful when reannotating genomes. In addition to the standard Species mode which as with other tools performs core-gene analysis across a species range, PyamilySeq can be run in Genus mode where it detects the presence of gene families shared across multiple genera. These features enhance the tools applicability for ongoing and past…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic and phenotypic traits in livestock · Genomics and Phylogenetic Studies
