Learned Indexing in Proteins: Extended Work on Substituting Complex   Distance Calculations with Embedding and Clustering Techniques

Jaroslav O\v{l}ha; Ter\'ezia Slanin\'akov\'a; Martin Gendiar; Matej; Antol; Vlastislav Dohnal

arXiv:2208.08910·cs.IR·October 6, 2022

Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques

Jaroslav O\v{l}ha, Ter\'ezia Slanin\'akov\'a, Martin Gendiar, Matej, Antol, Vlastislav Dohnal

PDF

Open Access

TL;DR

This paper proposes a lightweight machine learning-based approach for 3D protein structure search, transforming complex structures into compact vectors and using clustering and filtering to improve search efficiency.

Contribution

It introduces a novel three-step method combining vector transformation, probabilistic clustering, and filtering to enhance protein structure similarity search.

Findings

01

Effective vectorization of 3D protein structures

02

Improved search speed over traditional methods

03

Maintains acceptable accuracy with simplified computations

Abstract

Despite the constant evolution of similarity searching research, it continues to face the same challenges stemming from the complexity of the data, such as the curse of dimensionality and computationally expensive distance functions. Various machine learning techniques have proven capable of replacing elaborate mathematical models with combinations of simple linear functions, often gaining speed and simplicity at the cost of formal guarantees of accuracy and correctness of querying. The authors explore the potential of this research trend by presenting a lightweight solution for the complex problem of 3D protein structure search. The solution consists of three steps -- (i) transformation of 3D protein structural information into very compact vectors, (ii) use of a probabilistic model to group these vectors and respond to queries by returning a given number of similar objects, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Biomedical Text Mining and Ontologies · Algorithms and Data Compression

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings