StarBASE-GP: Biologically-Guided Automated Machine Learning for Genotype-to-Phenotype Association Analysis
Jose Guadalupe Hernandez, Attri Ghosh, Philip J. Freda, Yufei Meng, Nicholas Matsumoto, Jason H. Moore

TL;DR
StarBASE-GP is an automated machine learning framework that integrates biological knowledge and genetic programming to identify genetic variants associated with phenotypic traits in large genomic datasets.
Contribution
It introduces a biologically-guided, multi-objective genetic programming approach for automated genotype-phenotype association analysis, improving variant discovery accuracy.
Findings
Outperforms baseline methods in identifying true and novel loci.
Consistently evolves Pareto-optimal solutions with higher explanatory power.
Demonstrates robustness in complex trait analysis using rat data.
Abstract
We present the Star-Based Automated Single-locus and Epistasis analysis tool - Genetic Programming (StarBASE-GP), an automated framework for discovering meaningful genetic variants associated with phenotypic variation in large-scale genomic datasets. StarBASE-GP uses a genetic programming-based multi-objective optimization strategy to evolve machine learning pipelines that simultaneously maximize explanatory power (r2) and minimize pipeline complexity. Biological domain knowledge is integrated at multiple stages, including the use of nine inheritance encoding strategies to model deviations from additivity, a custom linkage disequilibrium pruning node that minimizes redundancy among features, and a dynamic variant recommendation system that prioritizes informative candidates for pipeline inclusion. We evaluate StarBASE-GP on a cohort of Rattus norvegicus (brown rat) to identify variants…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Genetic Mapping and Diversity in Plants and Animals · Genetic and phenotypic traits in livestock
MethodsPruning
