FPScreen: A Rapid Similarity Search Tool for Massive Molecular Library Based on Molecular Fingerprint Comparison
Lijun Wang, Jianbing Gong, Yingxia Zhang, Tianmou Liu, Junhui Gao

TL;DR
FPScreen is a rapid similarity search tool capable of processing 100 million molecular entries within an hour, leveraging parallel processing and MACCS fingerprint comparison for large-scale chemical library analysis.
Contribution
We developed FPScreen, a fast, web-based similarity search engine for massive molecular libraries using MACCS fingerprints and parallel processing techniques.
Findings
Completed similarity search for 100 million molecules within one hour.
Utilized MACCS fingerprint comparison for efficient similarity assessment.
Implemented parallel processing to enhance speed and scalability.
Abstract
We designed a fast similarity search engine for large molecular libraries: FPScreen. We downloaded 100 million molecules' structure files in PubChem with SDF extension, then applied a computational chemistry tool RDKit to convert each structure file into one line of text in MACCS format and stored them in a text file as our molecule library. The similarity search engine compares the similarity while traversing the 166-bit strings in the library file line by line. FPScreen can complete similarity search through 100 million entries in our molecule library within one hour. That is very fast as a biology computation tool. Additionally, we divided our library into several strides for parallel processing. FPScreen was developed in WEB mode.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Chemical Synthesis and Analysis · Protein Structure and Dynamics
