Human Limits in Machine Learning: Prediction of Plant Phenotypes Using Soil Microbiome Data
Rosa Aghdam, Xudong Tang, Shan Shan, Richard Lankau, Claudia, Sol\'is-Lemus

TL;DR
This study investigates the potential of machine learning models, specifically random forest and Bayesian neural networks, to predict plant phenotypes from soil microbiome and environmental data, highlighting the importance of data preprocessing and human decision-making.
Contribution
It is the first comprehensive analysis of soil microbiome data for plant phenotype prediction, emphasizing the impact of data normalization and human choices on model performance.
Findings
Incorporating environmental features improves prediction accuracy.
Naive normalization methods like total sum scaling are suboptimal.
Accurate label definition is more critical than normalization or model choice.
Abstract
The preservation of soil health is a critical challenge in the 21st century due to its significant impact on agriculture, human health, and biodiversity. We provide the first deep investigation of the predictive potential of machine learning models to understand the connections between soil and biological phenotypes. We investigate an integrative framework performing accurate machine learning-based prediction of plant phenotypes from biological, chemical, and physical properties of the soil via two models: random forest and Bayesian neural network. We show that prediction is improved when incorporating environmental features like soil physicochemical properties and microbial population density into the models, in addition to the microbiome information. Exploring various data preprocessing strategies confirms the significant impact of human decisions on predictive performance. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification
