TL;DR
PhenotypeToGeneDownloaderR is an automated, multi-source pipeline in R/Python that retrieves, harmonizes, and validates phenotype-associated genes, enhancing gene candidate generation for genetic analysis.
Contribution
It introduces a novel integrated pipeline that automates gene retrieval from multiple databases, standardizes outputs, and validates gene symbols, improving efficiency and comprehensiveness.
Findings
Generated over 136,000 gene retrievals across 13 phenotypes and databases.
Achieved 87.6% validation rate of gene symbols against NCBI reference.
Recovered 98.4% of known phenotype-associated genes in validation.
Abstract
Identifying phenotype-associated genes is a common first step in polygenic risk score construction, enrichment testing, target prioritisation and variant interpretation, but relevant evidence is distributed across heterogeneous databases with different interfaces, formats and evidence models. Here, we present PhenotypeToGeneDownloaderR, a phenotype-guided R/Python pipeline for automated gene retrieval, harmonisation, symbol validation and cross-source summary analysis. Given a phenotype term, the pipeline queries integrated biological databases, standardises per-source outputs, combines gene lists, validates retrieved symbols against the NCBI human gene reference and generates summary tables and visualisations. Across 13 clinically relevant phenotypes and 13 databases, PhenotypeToGeneDownloaderR generated 136,487 raw gene retrievals, with at least one source returning genes for every…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
