Linear-scaling kernels for protein sequences and small molecules outperform deep learning while providing uncertainty quantitation and improved interpretability
Jonathan Parkinson, Wei Wang

TL;DR
This paper introduces xGPR, a scalable Gaussian process framework that outperforms deep learning models in predicting properties of proteins and small molecules, while also offering uncertainty quantification and interpretability.
Contribution
The authors develop linear-scaling kernels and an open-source Python library, xGPR, enabling efficient Gaussian process modeling for sequence and graph data, outperforming deep learning on multiple benchmarks.
Findings
xGPR achieves competitive or superior performance compared to deep learning models.
xGPR provides uncertainty quantification unavailable in typical deep learning.
xGPR offers data representations useful for clustering and visualization.
Abstract
Gaussian process (GP) is a Bayesian model which provides several advantages for regression tasks in machine learning such as reliable quantitation of uncertainty and improved interpretability. Their adoption has been precluded by their excessive computational cost and by the difficulty in adapting them for analyzing sequences (e.g. amino acid and nucleotide sequences) and graphs (e.g. ones representing small molecules). In this study, we develop efficient and scalable approaches for fitting GP models as well as fast convolution kernels which scale linearly with graph or sequence size. We implement these improvements by building an open-source Python library called xGPR. We compare the performance of xGPR with the reported performance of various deep learning models on 20 benchmarks, including small molecule, protein sequence and tabular data. We show that xGRP achieves highly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Mass Spectrometry Techniques and Applications · Protein Structure and Dynamics
MethodsLib · Convolution
