Learning to SMILE(S)
Stanis{\l}aw Jastrz\k{e}bski, Damian Le\'sniak, Wojciech Marian, Czarnecki

TL;DR
This paper demonstrates how NLP techniques applied to SMILES representations can improve activity prediction in cheminformatics, surpassing traditional methods and providing structural insights.
Contribution
It introduces a novel approach of applying NLP models directly to SMILES strings for activity prediction, outperforming handcrafted features.
Findings
NLP methods outperform traditional cheminformatics representations.
The approach provides structural insights into decision-making.
Results show improved prediction accuracy.
Abstract
This paper shows how one can directly apply natural language processing (NLP) methods to classification problems in cheminformatics. Connection between these seemingly separate fields is shown by considering standard textual representation of compound, SMILES. The problem of activity prediction against a target protein is considered, which is a crucial part of computer aided drug design process. Conducted experiments show that this way one can not only outrank state of the art results of hand crafted representations but also gets direct structural insights into the way decisions are made.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Bioinformatics · Advanced Text Analysis Techniques
