Optimizing FPGA-based Accelerator Design for Large-Scale Molecular   Similarity Search

Hongwu Peng; Shiyang Chen; Zhepeng Wang; Junhuan Yang; Scott A.; Weitze; Tong Geng; Ang Li; Jinbo Bi; Minghu Song; Weiwen Jiang; Hang Liu; and; Caiwen Ding

arXiv:2109.06355·cs.AR·September 15, 2021·1 cites

Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search

Hongwu Peng, Shiyang Chen, Zhepeng Wang, Junhuan Yang, Scott A., Weitze, Tong Geng, Ang Li, Jinbo Bi, Minghu Song, Weiwen Jiang, Hang Liu, and, Caiwen Ding

PDF

Open Access

TL;DR

This paper presents a novel FPGA-based accelerator for large-scale molecular similarity search, achieving significant speedups over CPU implementations by optimizing exhaustive and approximate algorithms.

Contribution

It introduces the first FPGA implementation for molecular similarity search algorithms, optimizing both exhaustive and approximate methods for high throughput and accuracy.

Findings

01

450 million compounds per second throughput for exhaustive search

02

103,385 QPS on Chembl database with 0.92 recall for approximate search

03

35x speedup over CPU implementations

Abstract

Molecular similarity search has been widely used in drug discovery to identify structurally similar compounds from large molecular databases rapidly. With the increasing size of chemical libraries, there is growing interest in the efficient acceleration of large-scale similarity search. Existing works mainly focus on CPU and GPU to accelerate the computation of the Tanimoto coefficient in measuring the pairwise similarity between different molecular fingerprints. In this paper, we propose and optimize an FPGA-based accelerator design on exhaustive and approximate search algorithms. On exhaustive search using BitBound & folding, we analyze the similarity cutoff and folding level relationship with search speedup and accuracy, and propose a scalable on-the-fly query engine on FPGAs to reduce the resource utilization and pipeline interval. We achieve a 450 million compounds-per-second…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Analytical Chemistry and Chromatography · Machine Learning in Bioinformatics