Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search
Hongwu Peng, Shiyang Chen, Zhepeng Wang, Junhuan Yang, Scott A., Weitze, Tong Geng, Ang Li, Jinbo Bi, Minghu Song, Weiwen Jiang, Hang Liu, and, Caiwen Ding

TL;DR
This paper presents a novel FPGA-based accelerator for large-scale molecular similarity search, achieving significant speedups over CPU implementations by optimizing exhaustive and approximate algorithms.
Contribution
It introduces the first FPGA implementation for molecular similarity search algorithms, optimizing both exhaustive and approximate methods for high throughput and accuracy.
Findings
450 million compounds per second throughput for exhaustive search
103,385 QPS on Chembl database with 0.92 recall for approximate search
35x speedup over CPU implementations
Abstract
Molecular similarity search has been widely used in drug discovery to identify structurally similar compounds from large molecular databases rapidly. With the increasing size of chemical libraries, there is growing interest in the efficient acceleration of large-scale similarity search. Existing works mainly focus on CPU and GPU to accelerate the computation of the Tanimoto coefficient in measuring the pairwise similarity between different molecular fingerprints. In this paper, we propose and optimize an FPGA-based accelerator design on exhaustive and approximate search algorithms. On exhaustive search using BitBound & folding, we analyze the similarity cutoff and folding level relationship with search speedup and accuracy, and propose a scalable on-the-fly query engine on FPGAs to reduce the resource utilization and pipeline interval. We achieve a 450 million compounds-per-second…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Analytical Chemistry and Chromatography · Machine Learning in Bioinformatics
