Scalable Gaussian Processes on Discrete Domains
Vincent Fortuin, Gideon Dresdner, Heiko Strathmann, Gunnar R\"atsch

TL;DR
This paper investigates scalable Gaussian Process methods on discrete domains, comparing various inducing point selection techniques, and demonstrates that simulated annealing can effectively enhance scalability and performance on biological sequence data.
Contribution
It introduces and evaluates different inducing point selection methods for Gaussian Processes on discrete data, highlighting the effectiveness of simulated annealing.
Findings
Simulated annealing performs competitively with SVMs and full GPs.
Different inducing point selection techniques impact scalability and accuracy.
GPs with optimized inducing points can handle real-world DNA sequence data effectively.
Abstract
Kernel methods on discrete domains have shown great promise for many challenging data types, for instance, biological sequence data and molecular structure data. Scalable kernel methods like Support Vector Machines may offer good predictive performances but do not intrinsically provide uncertainty estimates. In contrast, probabilistic kernel methods like Gaussian Processes offer uncertainty estimates in addition to good predictive performance but fall short in terms of scalability. While the scalability of Gaussian processes can be improved using sparse inducing point approximations, the selection of these inducing points remains challenging. We explore different techniques for selecting inducing points on discrete domains, including greedy selection, determinantal point processes, and simulated annealing. We find that simulated annealing, which can select inducing points that are not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
