Tanimoto Random Features for Scalable Molecular Machine Learning

Austin Tripp; Sergio Bacallado; Sukriti Singh; Jos\'e Miguel; Hern\'andez-Lobato

arXiv:2306.14809·cs.LG·November 15, 2023·2 cites

Tanimoto Random Features for Scalable Molecular Machine Learning

Austin Tripp, Sergio Bacallado, Sukriti Singh, Jos\'e Miguel, Hern\'andez-Lobato

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces novel random feature methods to efficiently approximate the Tanimoto kernel, enabling scalable molecular machine learning on large datasets and extending the kernel to real-valued vectors.

Contribution

It proposes the first random feature approximations for the Tanimoto kernel, including an extension to real-valued vectors, with theoretical analysis and practical validation.

Findings

01

Effective approximation of Tanimoto coefficient on real datasets

02

Improved scalability for molecular property prediction

03

Theoretical error bounds on Gram matrix spectral norm

Abstract

The Tanimoto coefficient is commonly used to measure the similarity between molecules represented as discrete fingerprints, either as a distance metric or a positive definite kernel. While many kernel methods can be accelerated using random feature approximations, at present there is a lack of such approximations for the Tanimoto kernel. In this paper we propose two kinds of novel random features to allow this kernel to scale to large datasets, and in the process discover a novel extension of the kernel to real-valued vectors. We theoretically characterize these random features, and provide error bounds on the spectral norm of the Gram matrix. Experimentally, we show that these random features are effective at approximating the Tanimoto coefficient of real-world datasets and are useful for molecular property prediction and optimization tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

austint/tanimoto-random-features-neurips23
noneOfficial

Videos

Tanimoto Random Features for Scalable Molecular Machine Learning· slideslive

Taxonomy

TopicsMolecular spectroscopy and chirality · Face and Expression Recognition · Neural Networks and Applications