A Topology-Based Machine Learning Model Decisively Outperforms Flux Balance Analysis in Predicting Metabolic Gene Essentiality
Justin Boone

TL;DR
This study introduces a topology-based machine learning model that significantly outperforms traditional Flux Balance Analysis in predicting essential genes in E. coli by leveraging network structure features.
Contribution
The paper presents a novel graph-theoretic machine learning approach that surpasses FBA in accuracy for gene essentiality prediction, emphasizing the importance of network topology.
Findings
ML model achieved F1-Score of 0.400, outperforming FBA's 0.000
Topological features like betweenness centrality were effective predictors
Structure-based models can overcome limitations of simulation-based methods
Abstract
Background: The rational identification of essential genes is a cornerstone of drug discovery, yet standard computational methods like Flux Balance Analysis (FBA) often struggle to produce accurate predictions in complex, redundant metabolic networks. Hypothesis: We hypothesized that the topological structure of a metabolic network contains a more robust predictive signal for essentiality than functional simulations alone. Methodology: To test this hypothesis, we developed a machine learning pipeline by first constructing a reaction-reaction graph from the e_coli_core metabolic model. Graph-theoretic features, including betweenness centrality and PageRank, were engineered to describe the topological role of each gene. A RandomForestClassifier was trained on these features, and its performance was rigorously benchmarked against a standard FBA single-gene deletion analysis using a curated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
