Learning to SMILE(S)

Stanis{\l}aw Jastrz\k{e}bski; Damian Le\'sniak; Wojciech Marian; Czarnecki

arXiv:1602.06289·cs.CL·March 9, 2018·21 cites

Learning to SMILE(S)

Stanis{\l}aw Jastrz\k{e}bski, Damian Le\'sniak, Wojciech Marian, Czarnecki

PDF

Open Access

TL;DR

This paper demonstrates how NLP techniques applied to SMILES representations can improve activity prediction in cheminformatics, surpassing traditional methods and providing structural insights.

Contribution

It introduces a novel approach of applying NLP models directly to SMILES strings for activity prediction, outperforming handcrafted features.

Findings

01

NLP methods outperform traditional cheminformatics representations.

02

The approach provides structural insights into decision-making.

03

Results show improved prediction accuracy.

Abstract

This paper shows how one can directly apply natural language processing (NLP) methods to classification problems in cheminformatics. Connection between these seemingly separate fields is shown by considering standard textual representation of compound, SMILES. The problem of activity prediction against a target protein is considered, which is a crucial part of computer aided drug design process. Conducted experiments show that this way one can not only outrank state of the art results of hand crafted representations but also gets direct structural insights into the way decisions are made.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Machine Learning in Bioinformatics · Advanced Text Analysis Techniques