SELFIES and the future of molecular string representations
Mario Krenn, Qianxiang Ai, Senja Barthel, Nessa Carson, Angelo Frei,, Nathan C. Frey, Pascal Friederich, Th\'eophile Gaudin, Alberto Alexander, Gayle, Kevin Maik Jablonka, Rafael F. Lameiro, Dominik Lemm, Alston Lo, Seyed, Mohamad Moosavi, Jos\'e Manuel N\'apoles-Duarte

TL;DR
This paper reviews molecular string representations, highlights the advantages of SELFIES over SMILES, and proposes 16 future projects to advance robust molecular representations for AI applications in chemistry.
Contribution
It introduces 16 concrete future projects aimed at extending and improving molecular string representations like SELFIES for AI-driven chemistry research.
Findings
SELFIES guarantees 100% validity in molecular encoding
SELFIES has enabled new applications in chemistry
Proposes future projects to enhance molecular representations
Abstract
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, SMILES, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, SMILES has several shortcomings -- most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100\% robustness: SELFIES (SELF-referencIng Embedded Strings).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · History and advancements in chemistry
