Optimizing substructure search: a novel approach for efficient querying in large chemical databases
Vsevolod Vaskin, Dmitri Jakovlev, Fedor Bakharev

TL;DR
This paper presents a new indexing approach inspired by binary Ball-Trees to significantly improve the efficiency of substructure searches in large chemical databases, outperforming traditional methods.
Contribution
A novel tree-based indexing method that accelerates substructure search processes, reducing false positives and enhancing scalability in chemical database querying.
Findings
Significant speed-up in initial filtering compared to exhaustive search.
Outperforms existing algorithms like Bingo in efficiency.
Potential to reduce false positive rates in substructure verification.
Abstract
Substructure search in chemical compound databases is a fundamental task in cheminformatics with critical implications for fields such as drug discovery, materials science, and toxicology. However, the increasing size and complexity of chemical databases have rendered traditional search algorithms ineffective, exacerbating the need for scalable solutions. We introduce a novel approach to enhance the efficiency of substructure search, moving beyond the traditional full-enumeration methods. Our strategy employs a unique index structure: a tree that segments the molecular data set into clusters based on the presence or absence of certain features. This innovative indexing mechanism is inspired by the binary Ball-Tree concept and demonstrates superior performance over exhaustive search methods, leading to significant acceleration in the initial filtering process. Comparative analysis with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnalytical Chemistry and Chromatography · Computational Drug Discovery Methods · Mass Spectrometry Techniques and Applications
