Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques
Jaroslav O\v{l}ha, Ter\'ezia Slanin\'akov\'a, Martin Gendiar, Matej, Antol, Vlastislav Dohnal

TL;DR
This paper proposes a lightweight machine learning-based approach for 3D protein structure search, transforming complex structures into compact vectors and using clustering and filtering to improve search efficiency.
Contribution
It introduces a novel three-step method combining vector transformation, probabilistic clustering, and filtering to enhance protein structure similarity search.
Findings
Effective vectorization of 3D protein structures
Improved search speed over traditional methods
Maintains acceptable accuracy with simplified computations
Abstract
Despite the constant evolution of similarity searching research, it continues to face the same challenges stemming from the complexity of the data, such as the curse of dimensionality and computationally expensive distance functions. Various machine learning techniques have proven capable of replacing elaborate mathematical models with combinations of simple linear functions, often gaining speed and simplicity at the cost of formal guarantees of accuracy and correctness of querying. The authors explore the potential of this research trend by presenting a lightweight solution for the complex problem of 3D protein structure search. The solution consists of three steps -- (i) transformation of 3D protein structural information into very compact vectors, (ii) use of a probabilistic model to group these vectors and respond to queries by returning a given number of similar objects, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Biomedical Text Mining and Ontologies · Algorithms and Data Compression
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
