A Gaussian Process Model for Ordinal Data with Applications to Chemoinformatics
Arron Gosnell, Evangelos Evangelou

TL;DR
This paper introduces a Gaussian process model tailored for ordinal chemical data, incorporating chemical space correlations via Tanimoto distance, and demonstrates improved prediction and feature identification for chemical discovery.
Contribution
The paper develops a novel Gaussian process model with a scaling kernel for correlated ordinal data in chemoinformatics, enhancing prediction accuracy and feature selection.
Findings
Correlation-aware models outperform uncorrelated ones in prediction.
Incorporating chemical space improves model performance.
Genetic algorithms aid in chemical discovery and feature importance.
Abstract
With the proliferation of screening tools for chemical testing, it is now possible to create vast databases of chemicals easily. However, rigorous statistical methodologies employed to analyse these databases are in their infancy, and further development to facilitate chemical discovery is imperative. In this paper, we present conditional Gaussian process models to predict ordinal outcomes from chemical experiments, where the inputs are chemical compounds. We implement the Tanimoto distance, a metric on the chemical space, within the covariance of the Gaussian processes to capture correlated effects in the chemical space. A novel aspect of our model is that the kernel contains a scaling parameter, a feature not previously examined in the literature, that controls the strength of the correlation between elements of the chemical space. Using molecular fingerprints, a numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Statistical and Computational Modeling · Metabolomics and Mass Spectrometry Studies
MethodsGaussian Process
