# FPScreen: A Rapid Similarity Search Tool for Massive Molecular Library   Based on Molecular Fingerprint Comparison

**Authors:** Lijun Wang, Jianbing Gong, Yingxia Zhang, Tianmou Liu, Junhui Gao

arXiv: 1906.06170 · 2019-06-17

## TL;DR

FPScreen is a rapid similarity search tool capable of processing 100 million molecular entries within an hour, leveraging parallel processing and MACCS fingerprint comparison for large-scale chemical library analysis.

## Contribution

We developed FPScreen, a fast, web-based similarity search engine for massive molecular libraries using MACCS fingerprints and parallel processing techniques.

## Key findings

- Completed similarity search for 100 million molecules within one hour.
- Utilized MACCS fingerprint comparison for efficient similarity assessment.
- Implemented parallel processing to enhance speed and scalability.

## Abstract

We designed a fast similarity search engine for large molecular libraries: FPScreen. We downloaded 100 million molecules' structure files in PubChem with SDF extension, then applied a computational chemistry tool RDKit to convert each structure file into one line of text in MACCS format and stored them in a text file as our molecule library. The similarity search engine compares the similarity while traversing the 166-bit strings in the library file line by line. FPScreen can complete similarity search through 100 million entries in our molecule library within one hour. That is very fast as a biology computation tool. Additionally, we divided our library into several strides for parallel processing. FPScreen was developed in WEB mode.

---
Source: https://tomesphere.com/paper/1906.06170