Representation of Molecules by Sequences of Instructions
Karl Thurnhofer-Hemsi, Iván García-Aguilar, José David Fernández-Rodriguez, Ezequiel López-Rubio

TL;DR
This paper introduces a new way to represent molecules using sequences of instructions that ensure valid and modifiable molecular structures for computational methods.
Contribution
A novel chemical nomenclature system using instruction sequences that guarantees valid molecular representations and allows small structural modifications.
Findings
A reduced instruction set generates valid molecular representations.
Small changes in instruction sequences correspond to small molecular modifications.
The approach is suitable for computational intelligence systems like deep learning.
Abstract
The processing of chemical information by computational intelligence methods faces the challenge of the structural complexity of molecular graphs. These graphs are not amenable to being represented in a suitable way for such methods. The most popular representation is the SMILES notation standard. However, it comes with some limitations, such as the abundance of nonvalid strings and the fact that similar strings often represent very different molecules. In this work, a completely different approach to chemical nomenclature is presented. A reduced instruction set is defined, and the language of all strings that are sequences of such instructions is considered. Instructions provide the means to incrementally add atoms and modify the connectivity of the chemical bonds of atoms to be inserted. Instructions are carefully crafted to guarantee that all strings of this language are valid, i.e.,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34
Figure 35
Figure 36
Figure 37
Figure 38
Figure 39
Figure 40
Figure 41
Figure 42
Figure 43
Figure 44
Figure 45Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · History and advancements in chemistry
