Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data
Anuj Karpatne, Gowtham Atluri, James Faghmous, Michael Steinbach,, Arindam Banerjee, Auroop Ganguly, Shashi Shekhar, Nagiza Samatova, and Vipin, Kumar

TL;DR
Theory-guided data science (TGDS) integrates scientific knowledge into data models to enhance their applicability, interpretability, and ability to facilitate scientific discovery across various disciplines.
Contribution
This paper formalizes the TGDS paradigm, presents a taxonomy of research themes, and discusses approaches and future directions for integrating domain knowledge in data science.
Findings
TGDS improves model interpretability and scientific consistency.
It has been successfully applied in disciplines like turbulence, material discovery, and climate science.
Promising research avenues include developing new integration methods and expanding TGDS applications.
Abstract
Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
