Constructing Effective Machine Learning Models for the Sciences: A Multidisciplinary Perspective
Alice E. A. Allen, Alexandre Tkatchenko

TL;DR
This paper discusses the potential and limitations of machine learning models in scientific research, emphasizing the importance of interpretability and the conditions under which complex models outperform simpler linear models.
Contribution
It provides a multidisciplinary perspective on constructing interpretable machine learning models and offers guidance on recognizing when complex models are beneficial in scientific data analysis.
Findings
Non-linear models do not always outperform linear models with manual feature engineering.
Interpretable models are crucial for scientific understanding.
Complex models can improve results in certain scientific applications.
Abstract
Learning from data has led to substantial advances in a multitude of disciplines, including text and multimedia search, speech recognition, and autonomous-vehicle navigation. Can machine learning enable similar leaps in the natural and social sciences? This is certainly the expectation in many scientific fields and recent years have seen a plethora of applications of non-linear models to a wide range of datasets. However, flexible non-linear solutions will not always improve upon manually adding transforms and interactions between variables to linear regression models. We discuss how to recognize this before constructing a data-driven model and how such analysis can help us move to intrinsically interpretable regression models. Furthermore, for a variety of applications in the natural and social sciences we demonstrate why improvements may be seen with more complex regression models and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R · Neural Networks and Applications · Computational and Text Analysis Methods
MethodsLinear Regression
