Algebraic Model Selection and Experimental Design in Biological Data Science
Anyu Zhang, Jingzhen Hu, Qingzhong Liang, Elena S. Dimitrova,, Brandilyn Stigler

TL;DR
This paper introduces a unified algebraic framework that integrates experimental design and model selection for biological data, improving the identification of interactions and reducing bias in modeling biological networks.
Contribution
It proposes a novel algebraic approach using affine transformations to jointly optimize experimental design and model selection for discrete biological data.
Findings
Framework successfully identifies known biological interactions.
Application to EGFR signaling model demonstrates practical utility.
Potential to reduce bias and improve feature detection in biological modeling.
Abstract
Design of experiments and model selection, though essential steps in data science, are usually viewed as unrelated processes in the study and analysis of biological networks. Not accounting for their inter-relatedness has the potential to introduce bias and increase the risk of missing salient features in the modeling process. We propose a data-driven computational framework to unify experimental design and model selection for discrete data sets and minimal polynomial models. We use a special affine transformation, called a linear shift, to provide both the data sets and the polynomial terms that form a basis for a model. This framework enables us to address two important questions that arise in biological data science research: finding the data which identify a set of known interactions and finding identifiable interactions given a set of data. We present the theoretical foundation for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene Regulatory Network Analysis · Bioinformatics and Genomic Networks · Gene expression and cancer classification
