Crossover operators for molecular graphs with an application to virtual drug screening
Nico Domschke, Bruno J. Schmidt, Thomas Gatter, Richard Golnik, Paul Eisenhuth, Fabian Liessmann, Jens Meiler, Peter F. Stadler

TL;DR
This paper introduces a new genetic algorithm method for drug design by recombining molecular structures, improving diversity and effectiveness of candidate molecules.
Contribution
The paper introduces cut-and-join crossover operators for molecular graphs, a novel and effective approach for recombination in drug design.
Findings
Cut-and-join crossover preserves molecular properties like valency and planarity while generating plausible molecules.
The method increases diversity of candidate molecules compared to initial libraries and maintains synthesizability indices.
Using the method in REvoLd led to discovering molecules with better binding constants than existing ones.
Abstract
Genetic algorithms are a powerful method to solve optimization problems with complex cost functions over vast search spaces that rely in particular on recombining parts of previous solutions. Crossover operators play a crucial role in this context. Here, we describe a large class of these operators designed for searching over spaces of graphs. These operators are based on introducing small cuts into graphs and rejoining the resulting induced subgraphs of two parents. This form of cut-and-join crossover can be restricted in a consistent way to preserve local properties such as vertex-degrees (valency), or bond-orders, as well as global properties such as graph-theoretic planarity. In contrast to crossover on strings, cut-and-join crossover on graphs is powerful enough to ergodically explore chemical space even in the absence of mutation operators. Extensive benchmarking shows that the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 17- —http://dx.doi.org/10.13039/501100001659Deutsche Forschungsgemeinschaft
- —Novo Nordisk Foundation
- —Center of Excellence for AI-research Center for Scalable Data Analytics and Artificial Intelligence Dresden/Leipzig
- —http://dx.doi.org/10.13039/100015330Max Kade Foundation
- —http://dx.doi.org/10.13039/501100002347Bundesministerium für Bildung und Forschung
- —NIH
- —Alexander von Humboldt Foundation
- —de.NBI
- —Universität Leipzig (1039)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Click Chemistry and Applications · Chemical Synthesis and Analysis
Introduction
Graphs are by the far the most common mathematical formalism employed to model objects for which discrete components and their relation are an important aspect. Molecules, characterized by chemical bonds connecting atoms, were one of the earliest areas of applications already in the 19th century [1]. Nowadays, graphs are nearly ubiquitous in most areas of science and engineering, laying at the heart of the field of Network Theory [2].
Genetic algorithms (GA) are a widely used class of population-based heuristic optimization algorithms inspired by evolutionary adaptation in nature. At the core of a GA is a population of individuals that represent potential solutions to the problem at hand. In the most general setting, the population is used to produce a set of offspring by means of “genetic operators” that typically take one or two parents as input. The population of the next generation is then selected from the current population and their offspring using a selection rule that favors fitter solutions. The effectiveness of these algorithms depends on the representation of solutions, the definition of the genetic operators that act on these representations, and the selection scheme [3]. Following the paradigm of biological evolution, the overwhelming majority of GA implementations uses fixed length strings to represent solutions and employs mutation (i.e., relabeling) as well as crossover to produce variants to select from in each generation. Occasionally, other data structures are used, such as permutations as encodings of tours in the traveling salesman problem, that require specialized crossover operators [4].
Here, we aim to apply GAs to tasks that require the exploration of complex chemical spaces. Virtual Drug Screening [5] can be understood as an optimization problem over a chemical space, i.e., a set of molecules that may be subject to some restriction such as compliance with Lipinski’s Rule of Five [6]. The objective function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f:X\rightarrow {\mathbb {R}}$$\end{document} estimates e.g. a binding affinity of a candidate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in X$$\end{document} to a given drug target. In this setting, very little is known about the properties of the fitness function f, which is either computed from explicit molecular simulations or on the basis of a machine learning model. GAs thus are a very natural choice for the computational methodology to find candidates \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in X$$\end{document} that at least approximately optimize f(x). Evolutionary algorithms, including GAs, indeed have been used for decades in computer-aided drug discovery (CADD) [7], see [8] for a survey.
Make-on-demand libraries [9, 10] are populated in silico by means of fragment recombination that corresponds to easily accessible chemical reactions. Since the size of such chemical search spaces makes its enumeration impractical and in most cases impossible, GAs are a very natural approach for such problems. A simple example of such an approach is REvold [11], which is designed to optimize over the ENAMINE REAL Space [12]. In particular in the field for Virtual Drug Screening, it is of interest not only to optimize f but also to access molecules that are sufficiently different from well-known molecular shapes, thus are likely free from patents. This reinforces recombination of molecular fragments as an appealing search strategy.
In order to run GAs on more complex chemical spaces, the challenge is to implement efficient genetic operators that act directly on molecule graphs. As in most GA applications, we consider a mutation operator \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M: X \rightarrow X$$\end{document} that transforms a single candidate into a closely related variant, thus supporting local fine-tuning. Complementarily, crossover operators \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C: X\times X\rightarrow X$$\end{document} are meant to “interpolate” between a pair of parents, allowing the recombination of favorable features of both parents in a large “jump” in chemical space. The use of GAs in chemical space exploration, therefore, requires mutation and recombination operators that ensure that offspring are again plausible molecules. The use of exclusively mutation or recombination may also be suitable depending on definitions. A wide variety of representations have been used to represent molecules in evolutionary algorithms, including binary strings, fragments, SMILES, and of course molecular graphs.
Molecular graphs, albeit not formally defined in strict mathematical sense, are a very restricted class compared to labeled graphs in general. This is, of course, a direct consequence of the valence rules of chemistry and the fact that molecules are embedded in 3D space such that all the bond lengths are determined by bond order and the incident atoms. In particular, the structural formulae of real molecules are graph-theoretic planar graphs, i.e. they can be embedded in a plane even though the actual compound has a 3D structure, except for a small set of exceptional cases that have been synthesized mostly as curiosities [13–15].
Two classes of graph-based crossover are described in [16]: multipoint crossover starts from a cut that is chosen to minimize topological disruption and a reconnection procedure that ensures connectedness of the children [17]. Subgraph crossover, in contrast, identifies a subgraph in each parent and then proceeds by reconnecting them. In MEGA [18], crossover is guided by retro-synthetic information obtained from RECAP [19] or affects single bonds or rings [17], see also [20]. The graph-based GA described in [21] uses reaction SMILES to implement its crossover operator. EvoMD [22], finally, is defined on the binary adjacency matrices of the parent graphs. Even though graph-based GAs are not at all uncommon, very little is known about the mathematical properties of these crossover operators. Other methods have been proposed that are based solely on string manipulation. FASMIFRA [23] operates directly on SMILES, but disallows the crossover of rings. MolFinder [24] follows a similar strategy but allows crossover on rings with heuristic joining of fragments. In STONED [25] crossover and mutation based on string edit distance was proposed.
The main objective of this contribution is to define and characterize a large class of crossover operators for graphs. The key idea is to use cuts in graphs as means of (bi-)partitioning the parents and to recombine the resulting parts by means of a join operation. This very general scheme of cut-and-join crossover subsumes most of the procedures in graph-based GAs applied to chemical space exploration. Instead of implementing specific restrictions motivated by chemical intuition, we start from a very general construction that is not related to chemistry. We proceed to demonstrate that this leads to a class of operators that can be tuned in a very natural fashion to preserve both local properties, such as vertex degrees, as well as certain global properties, such as graph-theoretic planarity, that are of direct relevance for applications in chemistry. In other words, the proposed method is agnostic to chemical properties, but conserves them naturally without artificial restrictions. Our method was conceived as a well defined mathematical framework. Contrasting to previous publications, this framework yields simple to proof guarantees on which compounds can be generated, including synthesis paths. It has the potential to proof convergence for specific applications of genetic algorithms on chemical graphs.
This contribution is organized as follows: After introducing some basic graph-theoretic notation and discussing properties of chemical graphs, we give a formal account of the class of cut-and-join crossover operations. In particular, we show under which conditions degree-preserving and bond-order-preserving joins exist. We proceed by showing that under these conditions it is also always possible to construct crossover offspring that preserve graph-theoretical planarity of the structural formula. We then consider the problem of computing all recombinant offspring for a pair of parents for degree- or bond-order preserving operations. Continuing the analysis of the operators, we proceed to show that already in the degree-preserving case, cut-and-join crossover is sufficient to generate any molecular graph in a small population starting from a small molecular base set. In the empirical part of this work, we demonstrate that cut-and-join crossover produced highly diverse offspring that are nevertheless chemically plausible, synthesizable, and even drug-like with a sufficiently high probability that unrealistic proposal can be easily removed by well-established filtering steps. As a showcase application for cut-and-join crossover we describe its integration with REvoLd to search for ligands binding to four different target proteins. Finally, we close with a discussion of the main results and some open questions.
Preliminaries
An undirected graph \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G=(V,E)$$\end{document} consists of a set of vertices V and a set of edges E consisting of unordered pairs of vertices. In a directed graph, E is considered as a set of ordered vertex pairs. Below, it will be convenient for simplicity of the presentation to focus on undirected graphs. Undirected graphs are equivalent to symmetric directed graphs, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$xy\in E(G)\iff yx\in E(G)$$\end{document} , where xy and yx denotes directed edges between the vertices \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x,y\in V$$\end{document} . Where necessary, we will write V(G) and E(G) to refer to the vertex and edge sets of G. We define the edge neighborhood of a vertex \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N^G_e(v)=\{vw \mid w\in V(G), vw \in E(G\}$$\end{document} . In general we consider graphs with vertex and edge labels \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _V:V(G)\rightarrow L_V, \ell _E:E(G)\rightarrow L_E$$\end{document} , where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_V$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_E$$\end{document} are finite sets of labels. In applications to chemistry, vertex labels designate atom types (most commonly chemical element, isotope, charge) and edge labels identify bond types.
A graph H is a subgraph of G, in symbols \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H\subseteq G$$\end{document} , if H is a graph, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(H)\subseteq V(G)$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(H)\subseteq E(G)$$\end{document} . The subgraph H is induced by a vertex set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W{:=}V(H)\subseteq V(G)$$\end{document} if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$xy\in E(G)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x,y\in W$$\end{document} implies \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$xy\in E(H)$$\end{document} . For labeled graphs, furthermore, vertex and edge labels are inherited. A path in G is a sequence of pairwise distinct vertices \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_0,x_1,\dots ,x_n$$\end{document} such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i-1}x_i\in E(G)$$\end{document} . A cycle \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(x_1,\dots ,x_n)$$\end{document} consists of the path \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_1,\dots , x_n$$\end{document} together with the edge \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_nx_1\in E(G)$$\end{document} . A connected component is defined as a maximal connected subgraph \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H = (V',E')$$\end{document} of a graph G, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V' \subseteq V$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E' \subseteq E$$\end{document} , such that there exists a path in H between u and v, for every pair of vertices u, v in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'$$\end{document} .
Consider a bipartition \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V=V'\cup V''$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'\ne \emptyset$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V''\ne \emptyset$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'\cap V''=\emptyset$$\end{document} and denote by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H'=G[V']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H''=G[V'']$$\end{document} the two subgraphs of G induced by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V''$$\end{document} . The edge set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(V',V''){:=}\{xy\in E(G)|x\in V', y\in V''\}$$\end{document} is called the cut set of the cut \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(V',V'')$$\end{document} . Then \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(H')$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(H'')$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(V',V'')$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(V'',V')$$\end{document} are pairwise disjoint and form a partition of E(G). Note that in the undirected case, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(V',V'')={{\,\textrm{C}\,}}(V'',V')$$\end{document} since symmetric edges xy and yx are identified in undirected graphs. A cut-set that is inclusion minimal, i.e., that does not contain any other cut-set as a proper cut-set is also called a bond in the graph-theory literature. We will not use this term here to avoid confusions with chemical bonds. By slight abuse of notation we write \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(E) {:=}\{x \mid xy\in E\} \cup \{y \mid xy\in E \}$$\end{document} for the set of nodes incident with an edge-set E, such as a cut. The removal of the set of edges in a cut \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(V',V'')$$\end{document} leads to the formation of two or more connected components. We say a cut is a k-cut, where k is the number of connected components induced by the cut.
In order to accommodate bond orders we allow multi-edges. The degree of a vertex \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in V(G)$$\end{document} , denoted by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\deg _G(x)$$\end{document} , is then simply the number of incident edges, i.e., the multiplicity of edges is accounted for in the degree. For ease of presentation we will consider undirected, i.e., symmetric graphs throughout the main text. The introduced notation is illustrated in Fig. 1. Extensions to the directed setting are outlined briefly in the Appendix.Fig. 1. An example of a 2-cut on naphthalene. The application of the cut set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C(V',V'')$$\end{document} leads to the formation of the shown fragments, i.e the induced subgraphs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V'']$$\end{document}
Chemical graphs
The term chemical graphs is often used for labeled undirected graphs that represent chemical structural formulas. Here, we will use it more specifically to refer to the multigraphs that are equivalent to Lewis formulas [26] describing neutral compounds without unpaired electrons. In this setting, each bonding electron pair is represented as an edge, and each non-bonding electron pair as a loop. The bond order is therefore determined by the number of parallel edges. The electron pair of a bond is considered to be divided up between the two atoms that it connects, while a non-bonding pair is localized at a single atom. The number of electrons in the outer shell is usually preserved (at least for the atoms typically appearing in organic chemistry). The atom type therefore defines the degree of a vertex in the multigraph, which is given as the number of incident edges (counting multiplicities) plus twice the number of loops, see Fig. 2. This standard notion of degree in multigraphs agrees with Frankland’s “atomicity” and conforms to the IUPAC term valency [27]. We note in passing that the formalism can be extended in principle to accommodate unpaired electrons and charges e.g. by considering graphs with semi-edges [28]. Here, we will not make use of such extensions, however.Fig. 2. Lewis structure (left) and molecule multigraph (right) of allyl alcohol
The oxidation state of an atom is implicitly determined by the different number of loops and bonds and thus can be changed by converting loops to bonds or vice versa. For example sulfur may have degree two like oxygen, in which case there are two loops. These may be exchanged for bonding electron pairs, as e.g. in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {SO}_3$$\end{document} . For our purposes we disregard oxidation states and thus assume that the number of loops, i.e., non-bonding electron pairs, is also fixed for a given atom type. This amounts to considering loop-free graphs where the degrees are fixed by the atom types.
It is not difficult to accommodate also changes in oxidation states by allowing loops to be removed or two cut-edges to be combined to a loop. For ease of presentation, we will not consider this issue further.
Instead of considering multigraphs explicitly, one can also use simple graphs with integer edge weights representing the multiplicity of bonds. We shall see below that these two representation give rise to different crossover operators, both of which are arguable useful for searching chemical spaces. From an algorithmic point of view, it is typically easier to work with edge-weights on simple graphs.
Even within the setting of edge-weighted labeled graphs that are equivalent to Lewis formulas there is no formal definition which graphs constitute viable molecules. 1 H-azirene, for instance, is a perfectly ordinary graph, and the molecule it represents even has a chemical name. Nevertheless, 1 H-azirene does not seem to exist. We therefore have to be consistent with operating on a space of labeled graphs that have an interpretation in chemistry. Moreover, the relationship of chemical graphs and molecules is not injective in general: tautomerism and mesomerism may lead to distinct graphs that represent the same molecule. These ambiguities are well known in the chemistry literature [29, 30]. They have very little impact on GA-based search, since the number of redundant representations usually is very small and the fitness evaluation can be performed for each of them independently.
Additional graph-theoretic constraints may be of interest to further restrict the admissible class of graphs. One such property is planarity. Since almost all molecules have planar structural formulas, it may be of interest to restrict the search space in this manner to rule out molecules with too many geometric constraints. At least some of these can be captured at least approximately in terms of graph-theoretic properties, such as the number of small rings or excessively high connectivity. Many topological filter criteria have been proposed in an effort to generate a large database of potential organic molecules [31].
Mutation operations on graph genetic algorithms
A diverse collection of edit operations on graphs have been used in the literature, see e.g. [32, 33]. They share the property that they introduce small, local changes into a graph. Thus they can immediately serve as mutation operators in a GA. A non-exhaustive lists can be found in the Appendix. There are, however, several issues with these standard operations when applied to molecular graphs: Some of these operators (including vertex addition, node splitting, or edge deletion) do not preserve connectedness. It is of course possible to restrict these operations such that only connected graphs are produced. For example, node-splitting could be restricted to vertices that are not cut-vertices. Nevertheless, these operations are not readily applicable to chemical graphs, because in general they are not degree preserving. The only exception is node splitting of a vertex with even degree, where one could use operations of the form R-O-R to R-O-O-R and C to C=C such that each of the two carbons inherits two of the original carbons’ substituents.
A natural degree preserving mutation operator consists in a choice of two edges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e=xy$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f=uv$$\end{document} , cut both edges, and re-attaches the vertices as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e'=xu$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f'=yv$$\end{document} . Clearly, vertex degree is preserved. The operator can also be used as a way to model simple elimination reactions, since \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {CH-CCl}$$\end{document} can yield \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {C}=\hbox {C}$$\end{document} plus \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {HCl}$$\end{document} . However, in cases where the resulting molecule is disconnected, the reattachment of edges necessarily produces isomers, i.e., does not change the chemical composition. This kind of edge-swap operation was recently used in models for the evolution of population structures [34].
Cut-and-Join crossover operations on graphs
Joins from cuts
Throughout this section we assume that we are given two labeled, connected, undirected graphs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G=(V,E)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H=(W,F)$$\end{document} as well as proper subsets \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\emptyset \subsetneq V'\subsetneq V$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\emptyset \subsetneq W'\subsetneq W$$\end{document} of their vertex sets. Writing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V''{:=}V{\setminus } V'$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W''{:=}W{\setminus } W'$$\end{document} , moreover it seems natural to associate the newly inserted edges with the edge cuts \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(V',V'')$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(V'',V')$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(W',W'')$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(W'',W')$$\end{document} that separate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V'']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W']$$\end{document} from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W'']$$\end{document} , respectively.
We say that a cut \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D={{\,\textrm{C}\,}}(V',V'')$$\end{document} in a connected graph G is a k-cut if removal of the edge set D leaves k connected components. In particular, it is a 2-cut if both \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V'']$$\end{document} are connected (Fig. 3).Fig. 3. Increasingly restrictive join operators. While a join allows any nodes of two subgraphs to be linked, for a cut node preserving join only connections between nodes involved in the cut sets of their respective parental graphs are permitted. A degree preserving join further requires all nodes of the offspring graph to adhere to the degree of their respective parental vertices. The most restrictive join presented is the natural join, which introduces edge weights, e.g. bond orders, that must be conserved
A natural construction for crossover is then to consider an “offspring graph” to be obtained by joining the induced subgraphs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W']$$\end{document} by a set of edges connecting \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W'$$\end{document} . More formally:
Definition 1
A graph K is a the result of a join of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W']$$\end{document} if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(K)=V'\cup W'$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K[V']=G[V']$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K[W']=G[W']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(K)=E(G[V'])\cup E(H[W'])\cup B_{\circ }$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_{\circ }\subseteq V'\times W'$$\end{document} .
In its full generality, these joins encompass the multipoint graph crossover [16, 18, 20] as well as crossovers based on induced subgraphs. The subgraph crossover described in [16], however, is in general not of this type because there edges within a connected component can also be removed. This is not allowed in the cut-based operators described here. Depending on the application scenario, different constraints can be applied on how \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W']$$\end{document} are joined.
Definition 2
A join \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=G[V']\circ H[W']$$\end{document} is called cut-node preserving if for each edge \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$xy\in B_{\circ }$$\end{document} we have \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in V'$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y\in W'$$\end{document} , and there are edges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$xx'\in {{\,\textrm{C}\,}}(V',V'')$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$yy'\in {{\,\textrm{C}\,}}(W',W'')$$\end{document} .
In a cut-node preserving join, new edges are inserted only at vertices adjacent to the cut-edges in G and H, yielding a more stringent condition.
Definition 3
A join \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=G[V']\circ H[W']$$\end{document} is degree-preserving if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\deg _K(x)=\deg _{G}(x)$$\end{document} for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in V'$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\deg _K(y)=\deg _{H}(y)$$\end{document} for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y\in W'$$\end{document} .
As an immediate consequence, degree-preserving joins can be constructed only from cuts of the same size. Conversely, cuts in G and H with the same number of edges can always be used to produce a degree-preserving join:
Lemma 1
If a join is degree-preserving is its also cut-node preserving.
Proof
If the join is not cut-node preserving, then there is a vertex x in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(G)\cap V(K)$$\end{document} or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V(H)\cap V(K)$$\end{document} that is the endpoint of a join-edge while it is not incident to a cut-edge. Then \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\deg _K(x)>\deg (x)$$\end{document} in G or H, and thus the join is not degree-preserving. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square$$\end{document}
Lemma 2
There is a degree-preserving join if and only if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{{\,\textrm{C}\,}}(V',V'')] = |{{\,\textrm{C}\,}}(W',W'')|$$\end{document} .
Proof
Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A{:=}{{\,\textrm{C}\,}}(V',V'')$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B{:=}{{\,\textrm{C}\,}}(V',V'')$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|A|=|B|$$\end{document} . Then there exists a bijection \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\phi :A\rightarrow B$$\end{document} . Denote the cut edges in G by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$xx'\in A$$\end{document} , and those in H by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$yy'=\phi (xx')\in B$$\end{document} . Then the edges xy with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in V(G)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y\in V(H)$$\end{document} define a join. The \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$xx'\in {{\,\textrm{C}\,}}(V',V'')$$\end{document} in G is replaced by the uniquely defined edge \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$xy\in E(K)$$\end{document} , we have \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\deg _G(x)=\deg _K(x)$$\end{document} . Analogously, the edge \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$yy'\in {{\,\textrm{C}\,}}(W',W'')$$\end{document} in H is replaced by the uniquely defined edge \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$yx\in E(K)$$\end{document} . Thus \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\deg _G(x)=\deg _K(x)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\deg _H(y)=\deg _K(y)$$\end{document} . Hence, the join is degree preserving. Now suppose \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|A|\ne |B|$$\end{document} . Thus there is a vertex in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W']$$\end{document} such that the degrees in G or H, and K respectively differ since there is a non-compensate join edge. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square$$\end{document}
Lemma 2 yields a simple condition for the existence of a degree-preserving join: it is necessary and sufficient that the total weight of the cuts in G and H are the same.
Instead of considering multigraphs, one may also use integer edge-weights to describe bond orders. This gives rise to an even more restricted version of joins:
Definition 4
A join is natural if the number of edges with weight h is the same in G and K for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in V'$$\end{document} as well as in H and K for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y\in W'$$\end{document} .
In other words, a join is natural if the entire distribution of edge weights remains unchanged at each vertex. From a chemical perspective, this means that, for instance, a double bond must be “compensated” by a double bond in the recombinant graph K and cannot be “split” into two single bonds, or vice versa. Consider the example in Fig. 4:Fig. 4. Choosing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W'$$\end{document} as the top vertices of the two triangles yields two cuts consisting of two edges, each with weight 1. The only possible join in this case is a multigraph. This join is degree-preserving, as each of the two vertices still has two incident edges, but is not natural since it converted (parts of) a simple graph into a multigraph
The join of two nodes from triangles as shown in Fig. 4 is degree preserving but not natural since it changes two edges of weight 1 to a single edge of weight 2, and thus alters the distribution of edge weights at some vertices.
In contrast to the construction of degree-preserving joins, it is difficult in general both to determine whether a natural join exists and to construct and enumerate them. Essentially equivalent computational problems have been studied in the context of discrete tomography, a field concerned with the reconstruction of images from multiple projections. A short survey of pertinent results and references can be found in the Appendix. In brief, a linear-time algorithm exists to construct a natural join if there is only a single bond-type. In contrast, the generalization to two or more bond types becomes NP-hard even with various restrictions [35].
In the applications in later sections of this paper we are mainly interest in finding all (exponentially many) unique natural joins, which amounts to enumerating all possible recombinations of the parent fragments. In general, the number of unique solutions is exponential in the number k of bonds in the two cuts, as it can be well demonstrated on instances where each cut node is connected to a respective single cut edge of the same weight. With multiple bond types, the total number of candidate solutions is bounded by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\prod _{b} k_b!$$\end{document} , where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_b!$$\end{document} is the number of bonds of type b (e.g. single, double, triple, or aromatic). In principle one could simply generate all combinations of all permutations of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_b$$\end{document} bonds of type b for the second partner of the join and check (in linear time) if the corresponding candidate join is indeed a valid natural join. In order to avoid the overhead of enumerating possibly large numbers of invalid candidates, we opted for an implementation as an integer linear programming (ILP) instance, which naturally lends itself for this purpose and yields a straightforward implementation. A detailed description of the ILP formulation, which is efficient and precise, including a proof of its correctness, can be found in the Appendix.Fig. 5. Cut-and-join crossover of two planar graphs can be restricted to yield again a planar graph. Since every minimal cut of a planar graph is cycle in its complement, there is a circular order to the severed edges and thus of the vertices incident with the cutset. It sufficies to preserve the circular order of these vertices (in arbitrary orientation). The example here is in addition degree-preserving
The appeal of cut-and-join crossover is reinforced by the following result on planar graphs.
Theorem 1
Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G=(V,E)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H=(W,F)$$\end{document} be planar graphs and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A={{\,\textrm{C}\,}}(V',V'')$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B={{\,\textrm{C}\,}}(W',W'')$$\end{document} be inclusion-minimal cuts in G and H, respectively, with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|A|=|B|$$\end{document} . Then there exists a planar join \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']\circ H[W']$$\end{document} . In particular, if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|A|=|B|$$\end{document} , then there is a planar degree-preserving join.
Proof
Since the edges of an inclusion-minimal cut in G correspond to a simple cycle in the planar dual \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^*$$\end{document} of G [36], there is a planar embedding \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} such that all cut edges of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A={{\,\textrm{C}\,}}(V',V')$$\end{document} point into “outside” of this cycle. This defines a circular order not only on A but also on the set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V^*$$\end{document} of vertices incident to A. The same holds for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W']$$\end{document} , the cut edges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(W',W'')$$\end{document} , and the vertices \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^*$$\end{document} incident to this cut set. Thus there is an embedding of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W']$$\end{document} such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V^*$$\end{document} lies on a geometric cycle \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_1$$\end{document} and all other vertices in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'$$\end{document} are placed inside \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_1$$\end{document} , and similarly, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^*$$\end{document} lies on a cycle \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_2$$\end{document} and all other vertices of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W'$$\end{document} lie outside of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_2$$\end{document} . Now, one can place \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_1$$\end{document} inside \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_2$$\end{document} and connect vertices in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V^*$$\end{document} with vertices in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^*$$\end{document} such that these edges do not cross. Clearly the resulting join is planar. If \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|A|=|B|$$\end{document} , there is a bijection of cut-edges that can be chosen to preserve the circular orders (in opposite directions). In particular, the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|A|=|B|$$\end{document} new edges can be inserted to be degree-preserving: to this end, insert an edge between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in V^*$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y\in W^*$$\end{document} and proceed clockwise along \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c_2$$\end{document} to insert the desired number of (multi)-edges for x to y and then the clockwise neighbors of y. Then proceed to the clockwise neighbor of x and continue until all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|A|=|B|$$\end{document} are inserted. By construction, no two edges between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V^*$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W^*$$\end{document} cross each other, and thus the resulting embedding of the join is planar. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square$$\end{document}
The construction of a planar degree-preserving cut-and-join crossover is illustrated in Fig. 5.
Since cuts of the same cardinality from planar structural formulas can always be recombined to yield a degree-preserving join that is again planar. The resulting graph has a high likelihood to be again a plausible structural formula. Moreover, planarity reduces the search space for finding a natural join as it restricts the density of bonds that can occur between multiple atoms.
Finally, we note that every cut-and-join crossover deriving from a 2-cut is reversible in following sense:
Lemma 3
Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G=(V,E)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H=(W,F)$$\end{document} be connected graphs and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A={{\,\textrm{C}\,}}(V',V'')$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B={{\,\textrm{C}\,}}(W',W'')$$\end{document} be inclusion-minimal cuts in G and H, respectively, with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|A|=|B|$$\end{document} , and consider the joins \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G'=G[V']\circ H[W']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G''=G[V'']\circ H[W'']$$\end{document} . Then (i) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G'$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G''$$\end{document} are again connected and (ii) there is a cut-and-join crossover that produces G and H as cut-and-join crossover of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G'$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G''$$\end{document} .
Proof
Since we consider inclusion-minimal cuts, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V'']$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W'']$$\end{document} are connected, und thus the joins are also connected. To see the reversibility consider the cuts \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(V',W')$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(V'',W'')$$\end{document} of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G'$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G''$$\end{document} , which recover \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W']$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V'']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W'']$$\end{document} and note that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(V',V'')$$\end{document} in G and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\,\textrm{C}\,}}(W',W'')$$\end{document} in H are legitimate joins for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V'']$$\end{document} , as well as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H[W'']$$\end{document} . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square$$\end{document}
Clearly, this remains true for degree-preserving and natural 2-cuts.
The generation of recombinant graphs from two parent graphs G and H consists of two parts: (i) enumeration of a suitable set of cuts and (ii) construction of a set of recombinants. In the following two subsection we will consider these tasks in detail.
Enumeration of cuts
A cut-set is redundant if it contains a strictly smaller cut \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D'\subsetneq D$$\end{document} as a subset.
Lemma 4
Suppose removal of the cut \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D\subseteq E(G)$$\end{document} decomposes G into three or more connected components. Then D is redundant.
Proof
Consider a cut \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D\subseteq E(G)$$\end{document} that partitions V(G) into three mutually disconnected vertex sets \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_2$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_3$$\end{document} . Note that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V_i]$$\end{document} may itself be a disjoint union of connected components. For \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{i,j,k\}=\{1,2,3\}$$\end{document} write \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i{:=}\{xy| x\in V_j, y\in V_k\}$$\end{document} . Thus the edge sets \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_2$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_3$$\end{document} form a partition of D. If, say, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_1=D_2=\emptyset$$\end{document} then there is no edge between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_3$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_1\cup V_2$$\end{document} , and thus G was not connected to begin with. Thus at most one \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i$$\end{document} may be empty. If \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_1=\emptyset$$\end{document} , then \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D=D_2\cup D_3$$\end{document} comprises only the edges in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_2$$\end{document} between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_3$$\end{document} , and the edges in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_3$$\end{document} between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V_2$$\end{document} , respectively. In particular, therefore, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_2\subsetneq D$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_3\subsetneq D$$\end{document} are also cuts for G. If all three edge sets \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_i$$\end{document} are non-empty, then each of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_1\cup D_2$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_2\cup D_3$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D_1\cup D_3$$\end{document} is a cut of G and a proper subset of D. Thus D is redundant because it can be obtained as a union of strictly smaller cuts. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square$$\end{document}
Suppose we restrict ourselves to 2-cuts and sets of graphs without cycles, i.e. trees. Then every cut-set contains only a single edge and thus all possible degree-preserving joins are again trees. Similarly, if all graphs are trees or cacti, i.e., graphs in which every cycle forms its own block (2-connected component), then all 2-cuts either contain individual edges or cut a unique cycle in exactly 2 points. Any edge-preserving join therefore is again a cactus. A restriction to 2-cuts thus severely limits the generative power of degree-preserving crossover operators. We will have a close look at this issue below.
The problem of enumerating cut sets in a graph is a problem with a long history, see e.g. [37, 38]. It is well known that graphs may have an exponential number of minimum-weight cuts separating a given pair of vertices [39]. Algorithms for enumerating cuts thus are evaluated w.r.t. to the effort per cut. In [40] a deterministic algorithm with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(|V|+|E|)$$\end{document} delay between two output cuts has been described. Recently, an algorithm listing all inclusion minimal s, t-cuts in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{O}(|V|)$$\end{document} amortized time (i.e., with an average running time linear up to logarithmic terms in |V| per cut) has been reported [41]. Surprisingly, computing the size of largest inclusion-minimal cut is NP complete [42].Fig. 6. Left: Search-tree of the cut enumeration algorithm. Right: The different steps of the cut enumeration algorithm shown on camphor. Enumeration of a subtree can be halted, if a cut results in > 2 connected components or the cut node set has already been visited (colored red)
Since crossover at cut-sets of size 1 is rather trivial, we first extract the 2-edge connected components. These non-trivial scaffolds are then processed independently. For chemical graphs the simple algorithm outlined in Fig. 6 is used. For each initial vertex \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in V(G)$$\end{document} it starts from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'=\{x\}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V''=V\setminus V'$$\end{document} and proceeds by moving a vertex \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y\in V''$$\end{document} that is adjacent to some vertex in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'$$\end{document} from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V''$$\end{document} to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'$$\end{document} in each step. A cut-set is accepted if both \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V'']$$\end{document} are both connected. A branch in the search tree is abandoned if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V']$$\end{document} or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V'']$$\end{document} are not connected, or if the cut-set has been encountered already in another branch. In this manner we obtain all 2-cuts without being constrained to the inclusion-minimal cuts of G.
An interesting alternative to the enumeration of all cuts is to use a method of ordered generation that produces all cuts in the order of non-decreasing weights. The first method of this type was proposed in [43]. Later an improved version based on a different solutions of maximum flow problems was published [44].
Natural joins as an ILP
Although it is possible to enumerate degree-preserving or natural joins directly, it is more convenient and elegant to rephrase these join operation as integer linear programming (ILP) problems. Standard ILP solvers (reviewed e.g. in [45]) can then be used to enumerate these joins. Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_G:= {{\,\textrm{C}\,}}(U',U'')$$\end{document} for some graph G with weighting function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _G$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_H:= {{\,\textrm{C}\,}}(W',W'')$$\end{document} for some graph H with weighting function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _H$$\end{document} such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|C_G|=|C_H|$$\end{document} . Moreover, let the number of edges with weight h be the same for every \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h\in {\mathbb {N}}$$\end{document} in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_G$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_H$$\end{document} , hence there is a natural join. The goal is to generate all natural join edge sets \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_\circ$$\end{document} to reconnect the connected components \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$U'$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W'$$\end{document} derived from a cut. Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e\in C_G$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f\in C_H$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u\in U',w\in W'$$\end{document} such that u is incident with e and w is incident with f. We track whether to join u and w by an edge with weight \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell (f)$$\end{document} by variable \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{ef}$$\end{document} such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{ef} = 1$$\end{document} if u and w are joined by an edge, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{ef}=0$$\end{document} otherwise. This yields for the set of join edges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_{\circ }=\{uw\ |\ a_{ef}=1\}$$\end{document} . For the edge-weights in the join graph \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=(U'\cup W', E')$$\end{document} with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E':=E(G[U'])\cup E(H[W']) \cup B_\circ$$\end{document} we have the labeling function \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _K(e)=\ell _G(e)$$\end{document} if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e\in E(G)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _K(e)=\ell _H(e)$$\end{document} else. For every cut-edge in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_G$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_H$$\end{document} , respectively, there is exactly one “joining edge” in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_\circ$$\end{document} , which imposes the constraints
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sum _{e\in C_G} a_{ef} = 1 \text { for all } f\in C_H \quad {and}\quad \sum _{f\in C_H} a_{ef} = 1 \text { for all } e\in C_G \end{aligned}$$\end{document}The faithful preservation of edge-weights in K can be modeled by a simple constraint
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sum _{f\in C_H} a_{ef}\cdot (\ell _G(e)-\ell _H(f)) = 0, \text { for all } e\in C_G. \end{aligned}$$\end{document}Finally, we need to ensure that we do not introduce new multi-edges into the join graph K. For a vertex \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$u\in U'$$\end{document} , let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_u$$\end{document} annotate the edges of u that are incident to u in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_G$$\end{document} (analogously for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w\in W'$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_w$$\end{document} ). Since the introduction of multi-edges to K can only occur for vertices where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_u$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_w$$\end{document} , respectively, contain more than one edge, we obtain the following constraint:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sum _{e\in E_u} \sum _{f\in E_w} a_{ef} \le 1 \quad \text {for all } u\in U', w\in W' \text { with } |E_u|,|E_w|>1. \end{aligned}$$\end{document}This constraint can be dropped whenever \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_u$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_w$$\end{document} do not contain any edges with the same weight, since then joining u and w in K is already ruled out by equation (3). Figure 16 provides a graphical representation of the relation between the constraints (2), (3), (4) and the different join operations. The natural joins are then obtained by maximizing the objective function
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sum _{e\in C_G}\sum _{f\in G_H} a_{ef} \rightarrow \max \end{aligned}$$\end{document}Theorem 2
The set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_{\circ }=\{uw\ |\ a_{ef}=1\}$$\end{document} is the set of join edges of a natural join if and only if the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_{ef}$$\end{document} are a solution of the ILP specified by equations (2) to (4).
The proof can be found in the appendix.
A major advantage of the ILP is that it can be modified easily to incorporate additional chemical constraints or optimizations to obtain natural joins between two molecular fragments. This can be achieved by either adding penalties or by excluding structure such as allenes \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\hbox {C}=\hbox {C}=\hbox {C}-$$\end{document} by adding constraints on the neighborhoods in G and/or H for edges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e\in B_\circ$$\end{document} . In such a weighted version, ILP solvers can efficiently produce limited top-lists of candidates without the need for an exhaustive enumeration and subsequent ranking.
Generative properties of degree-preserving joins
Crossover operators on strings are inherently limited in that they cannot introduce letters at positions where no such letter is present in the same position. More generally, crossover operators produce offsprings that are “between” the parents and thus more similar to their parents than the parents to each other [46]. This can lead to a loss of diversity that, in GA implementations, is counteracted by mutation operators designed to increase diversity.
Cut-and-join crossover in graphs unsurprisingly suffers from similar issues. Before we explore these limitations, however, we briefly discuss some of their desirable properties. Most importantly, functional groups can be easily transferred between molecular scaffolds. To this end, cuts of size 1 or 2 are sufficient, see Fig. 7. Labels, i.e., atom types, with the same valency can be replaced by cuts of size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\deg (x)$$\end{document} . Moreover, hydrogen atoms are readily exchanged for larger molecular fragments, provided there is a source graph from which the desired fragment can be cut. Simple reactions such as ether formation, esterification, etc., thus have their direct counterpart in cut-and-join crossover, see Fig. 7.Fig. 7. Left: Functional groups can be interconverted by a simple cut-and-join crossover. The general position (primary, secondary, tertary, quaternary) of a functional group can be changed by means of crossover operations that replace a hydrogen atom by a larger fragment (dotted arrows). Right: Cut-and-join crossover involving hydrogens enable access to simple reactions
In a similar vein, it is easy to grow chains, e.g. by crossover of an H at the end of a chain with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {CH}_{4}$$\end{document} , if an extension by one carbon unit is desired, and to enlarge or shrink rings by cuts of size 2 and a subsequent join using two rings as parents. If the initial set of graphs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {G}}$$\end{document} contains \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {CH}_{4}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {NH}_{3}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {H}_{2}\hbox {O}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {H}_{2}$$\end{document} , we can in particular build all cycle-free molecules even if only 2-cuts of size 1 are allowed. If only 2-cuts are allowed, however, cycles have to be present at the outset to allow the formation of cycles or additional bridges, since 2-cuts in cycle-free graphs necessarily have size 1. The closure of a new ring, however, requires that the join edges connected two vertices on one fragment with a vertex with degree-defect 2 on the other fragment, or two pairs of vertices.Fig. 8. Within a minimal polycyclic set, bridged polycycles sharing multiple edges are accessible as well as cubanes
This can be achieved by also allowing 3-cuts of size 2, i.e., cutting two edges in G whose removal leads to three connected components. It is possible then to take two trees G and H, remove two disconnected pieces from both of them, and then form a (natural) join that introduces a cycle. The same operation also allows to convert \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\hbox {CH2}-$$\end{document} to a carbonyl group (by crossover with water), to convert \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\hbox {CH2}-$$\end{document} in a ring to a spiro-carbon by crossover with a 3-cut linear chain. Crossover of polycyclic compounds readily provides access to bridged compounds, see Fig. 8. These examples suggest that all molecular structures can be constructed by a sequences from degree-preserving crossovers from very simple building blocks.
This is true in a rather strong sense: for bounded degree graphs, a uniformly bounded size population and a number of crossover operations linear in the size of the target graph is indeed sufficient. In order to prove this statement formally, we need the following notation and terminology: Let D be a finite set of integers such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\in D$$\end{document} and let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {S}}$$\end{document} denote the search space comprising all loop-free connected graphs G such that all vertices \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in V(G)$$\end{document} have degree \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\deg (x)\in D$$\end{document} . Moreover, we denote by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {T}}_D$$\end{document} the set of star trees \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_k$$\end{document} with a central vertex of degree \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\in D$$\end{document} . Note that for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_1=K_2$$\end{document} , denotes the graph comprising two vertices connected with an edge.
Theorem 3
For every finite graph \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G\in {\mathcal {S}}$$\end{document} there is a sequence of of at most 2|E(G)| degree-preserving cut-and-join crossover and selection steps on a population of size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{\mathcal {X}}|\le |D|+3$$\end{document} .
Proof
We proceed by induction in the number \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n'$$\end{document} of “heavy” vertices \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in V(G)$$\end{document} with degree \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\deg (x)>1$$\end{document} . The statement is trivially true for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G\in {\mathcal {T}}_0$$\end{document} and these are by construction the only graphs with a single vertex of degree larger than 1. The only connected graph without a vertex of degree larger than 1 is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_2\in {\mathcal {T}}_0$$\end{document} .
Now let G be a connected arbitrary graph and suppose the statement is true for all graphs with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n''<n'$$\end{document} heavy vertices. First suppose that G contains a cut vertex z. Then consider the graphs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_2$$\end{document} , ... \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_k$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\ge 2$$\end{document} , obtained by splitting G at the cut-vertex z such that (i) a copy of z is present in each \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_i$$\end{document} , and (ii) the vertex degree of z is restored by adding an edge and vertex of degree (i). Then G can ob obtained by recombining the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_i$$\end{document} sequentially, such that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{G}}_i$$\end{document} is composed of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_1$$\end{document} through \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_i$$\end{document} for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\le i\le k$$\end{document} . In particular, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{G}}_1=G_1$$\end{document} . In each step, we cut z and its degree-1 neighbors from the rest of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_i$$\end{document} , yielding a cut of size h. From \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{G}}_{i-1}$$\end{document} we cut h degree 1 neighbors and then form the requisite join. By construction, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{G}}_k=G_k$$\end{document} , and by induction hypothesis each of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_j$$\end{document} can be constructed in the bounded population. To reach \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_k$$\end{document} , it thus suffices to keep \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\tilde{G}}_{i-1}$$\end{document} in the population while constructing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_i$$\end{document} . The number of cut-and-join steps equals the number of components and thus at most the number of edges incident to the cut vertex.
If G has a cut edge, between heavy vertices, there are two graphs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_2$$\end{document} with strictly fewer heavy vertices, obtained by removing the cut edge and replacing the loose ends by vertices of degree 1. This process is easily reverted by a single cut-and-join crossover, requiring only \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_1$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_2$$\end{document} in the population.
Finally, if G is biconnected, then we make use of the open ear decomposition [47] on the subgraph consisting of the heavy vertices only. If there is an ear with at least one interior (heavy) vertex, it suffices to construct the corresponding path by recombination of the growing chain with the star graph in a single step. Once the path is available it is recombined into G. If there are only ears left that consist of a single edge (i.e., that do not have interior vertices) between to heavy vertices, uv, then we consider the subgraph \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G'$$\end{document} from which uv is cut out (and loose ends are compensated by degree one vertices, and the fragment uv). Cutting the requisite number of degree 1 vertices from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G'$$\end{document} and uv, respectively, there is a degree-preserving join that produced G. The insertion of an edge thus can be achieved with two cut-and-join steps: one for the formation of uv and one for the insertion of uv.
In summary, therefore, every graph \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G\in {\mathcal {S}}$$\end{document} can be produced by a number of cut-and-join crossover steps not exceeding twice the number of edges between heavy vertices starting from building blocks in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {T}}_0$$\end{document} . In each step we can produce the graph G by keeping in the population at most one graph \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G^*$$\end{document} with which the component under construction will eventually be recombined, the component \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_1$$\end{document} currently under construction, a path P that will eventually be recombined with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_1$$\end{document} and the reservoir set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {T}}_0$$\end{document} of building block that are required either to be composed into P or directly inserted into \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G_1$$\end{document} by recombination. Hence a population of size \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|{\mathcal {T}}_0|+3$$\end{document} is sufficient. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square$$\end{document}
An analogous argument can be made for natural joins. This requires a larger set of building blocks, however, that contains “templates” for all multiple bonds, since these then cannot be obtained from cutting off a suitable number of degree 1 vertices but require cuts involving multi-edges for vertices with the same weight distribution of incident edges.
In practice, cuts that require cutting at both ends to a path may require up to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2 (D_{\max }-1)$$\end{document} edges in the cut-set. It is convenient therefore, to use mutation operators that allow the insertion of an edge by means of edge swapping and the addition of a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_2$$\end{document} to be able to limit cuts to small sizes.
The construction in the proof of Thm. 3 has direct implications for the exploration of chemical space. First, it shows that functional groups can be introduced easily into a scaffold of interest. This can usually be achieved by a single cut of size 1 or 2 provided the functional group of interest is already appearing pre-formed in the population or provided as part of the reservoir set. This simple observation suggest to construct GAs for chemical space exploration in such a way that a reservoir set is always kept in the population as means of replenishing diversity that otherwise may be lost throughout a GA run: if an atom-type has been selected out of the population, crossover operators by definition are incapable of recovering it. It can be taken easily from the reservoir set however. In the field of retro-synthetic analysis, Stuart Warren defined a functional group interconversion (FGI) as “the process of converting one functional group into another by substitution, addition, elimination, oxidation, or reduction.” In our setting, all these operations can be achieved with degree-preserving cut-and-join crossover. For crossovers restricted to natural joins, however, the mutation operator is required to change bond order, or paths of length 2 need to be replaced by cuts with larger cut-sets.Fig. 9A minimal set of base graphs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {B}}$$\end{document} needed to access all graphs of degree at most 4 using a sequence of natural join operators
It is also possible to restrict the cut operations to 2-cuts at the expense of a larger set of “base graphs” that provide building blocks to construct more complex scaffolds. We restrict the following discussion to graphs with degree at most 4, which in essence covers the search space of organic molecules. Starting point is the set of graphs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {B}}$$\end{document} shown in Fig. 9.
We endow the set of all graphs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {G}}$$\end{document} with a directed graph structure by inserting a directed edge (G, H) between two graphs \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G{:=}(U,F)\in {\mathcal {G}}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H=(W,D) \in {\mathcal {G}}$$\end{document} if H is the result of a natural join of 2-cuts from G and one of the base elements \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$B_i\in {\mathcal {B}}$$\end{document} . Let us denote the resulting graph by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vec {{\mathcal {G}}}$$\end{document} . In Additional file 1 we show in a purely graphical manner that all local neighborhoods of a vertex of degree at most four can be interconverted by natural cut-and-join crossover with members of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {B}}$$\end{document} . As an immediate consequence, we see that every graph can be stepwisely built up from the base set, and each of the construction steps is reversible by Lemma 3. This implies
Theorem 4
The search space graph \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{{\mathcal {G}}}$$\end{document} is strongly connected.
We note that an analogous result can also be obtained for variants of graph construction in Thm. 3. To see this, it suffices to construct graphs containing the base elements and add them to the reservoir. As an alternative, it also suffices to consider 2-cuts and allow the opening or closing of a bond in exchange for a \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_2$$\end{document} , i.e., edge-swaps, as a mutation operator. The strong connectedness of the search space graph has important consequences for the performance of a GA built upon the cut-and-join cross-over operations: it ensures that any desired molecule can be reached with a non-zero probability from any starting point. This is true as long as there is a finite probability that the required base set or the repertoire set remains in the population for any finite number of steps. The reason is simply that the construction pathways described above are finite, and each step is realized with a finite probability.
Evaluation and benchmarking of cut-and-join recombinants
Diversity of recombinants
We first investigate to what extent natural or degree-preserving crossovers generates novelty. To this end, we use a small random seed set from the USPTO-10k “clean” dataset, a repository of patented molecules introduced in [48], and ask whether the crossover products are still contained in this data set. One starting molecule gives rise to 4,697 unique recombinants whereas five or ten yield 75,083 and 396,696, respectively (All used starting molecules are shown in Additional file 2). The test was performed by converting the crossover products to canonical InchIs [49], since they also account for resonance structures and tautomers, and comparing them. Only 3, 18, and 36 recombinants were recovered from the USPTO-10k database. Due to rdkit issues in converting molecular graphs, 30 InchI strings were invalid. Instead their respective canonical SMILES strings were used for further benchmarking.Fig. 10. Diversity of crossover products taken from unrestricted cut-and-join operation on a sample of 10 molecules from the USPTO-10k. The homogeneous distribution of the recombinants in the t-SNE embedding shows that offspring molecules cover a high molecular diversity
In order to visualize the distribution of parent (USPTO-10k) molecules and crossover offspring molecules, we computed 1024-bit Extended-Connectivity Fingerprints (ECFP), a bit-vector representation of structural features in molecules including the circular substructures of each atom in a radius of 2 and performed a t-stochastic neighborhood embedding (tSNE) [50]. Figure 10 shows that the offsprings cover a much wider range also in terms of fingerprint descriptors.
Post-processing of candidate molecules
While the restriction to possibly planar chemical graphs is an efficient pre-selection, many of these graphs will not represent stable, synthesizable molecules that are “drug-like”. Since cut-and-join can produce candidate molecules very efficiently it seems to be a useful strategy to prune the set of candidate graphs by removing chemically implausible entries. Commonly, molecular graphs are checked for “molecular validity” by means of two methods:
- Parsing 1D string representations.For this either SMILES or SELFIES can be used. SELFIES have valency and bond configuration checks by design, so even random strings represent valid molecules. SMILES on the other hand can be sanitized, for example using RDKit. With the extension of SMARTS, substructure matches can be found and thus seemingly molecules containing instable functional groups can be removed.
- 3D embeddability.A wide variety of methods have been devised to generate molecular geometries from structural formulas, see e.g. [51–54]. These can be used to the check the chemical plausibility of molecular graphs by generating a 3D model and checking whether interatomic distances and bond angles are within the ranges known from stable molecules. We used a subset of the topological and functional group filters described in [31] and implemented them using SMARTS. The filters are divided into three tables. Filters regarding ring strain, such as small fused or spiro rings, eliminate 3, 108 molecules (0.8%). The other subset of filters is equipped for problems with unsaturated bonds, such as the filtering of difficult to synthesize allenes, effectively removing 55, 517 (14.0%) molecules. Functional group filters constitute the largest single category, excluding volatile and unaccessible functional groups such as peroxides or anhydrides. The majority, 128, 410 molecules (32.4%), were discarded by functional group filters.
The post-processing steps defined in Table 4 of [31] also included the conversion of functional groups to enrich chemical space. This was omitted since we aim to characterize the crossover products without further modification. Since rdkit had no suitable algorithm to properly classify bridgeheads in complex recombinant molecules, we abstained from including functions dedicated of identifying them. Moreover, we skipped checks of atomic volume. Instead, we directly considered 3D embedability using the rdkit implementation of basic knowledge informed Experimental-Torsion Distance Geometry (ETGK) [55]. The method attempts to place a molecule in 3D space such that known bond length, bond angles, and torsion angles for bonds are respected. If this is not possible, the violations are recorded.
We subjected 209, 665 filtered molecules of the 10 starting molecule recombinant set to the ETGK algorithm. In this set, 150, 175 molecular graphs (71.6%) were successfully embedded, while 28.4% showed at least one violation. Among the 59, 490 embedding errors, the most common issue is that planar arrangements of atoms (e.g. in aromatic systems) cannot be achieved in the minimization step of the embedding. In about a quarter of the non-embeddable molecules the method was unable to find an initial embedding. More details on the types of embedding violations can be found in Additional file 3.
Synthesizeability
In real life applications, in particular in Drug Discovery, only molecules are of interest that are readily synthesizable. Many different measures of synthesizeablity have been proposed in the literature, see e.g. [56] for a recent overview. SAScorer, for instance, is pretrained on PubChem data and focusses on structural features. Penalties are given for large number of atoms, spiro centers, and bridges [57]. A lower score is associated with an easier access through synthesis. An alternative is to use SCScore [58], which is trained on reaction networks with the thought that synthetic complexity increases with the expected number of reaction steps. A higher score then indicates more steps required for synthesis.Fig. 11. Synthesizability scores calculated by SAScore and SCScore on the 10-sample molecule recombinant set in comparison to the USPTO-10k dataset. Both tools show an increase in the difficulty of synthesizablility. The average \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document} SA Score is listed in Additional file 4
Distributions of synthesizability scores using both tools on the 10-sample molecule recombinant set are summarized in Fig. 11. As expected, we observed that the score distribution of crossover products is less favorable than the starting molecules. Nevertheless, the distributions show a large overlap and crossover products even with very good scores are not particularly rare. For the set of unfiltered recombinants, 60, 815 of 396, 700 molecules (15.3%) are located below the 75th percentile of the USPTO-10k SAScore (SAScore 3.63). In addition, the positive influence of the post-processing on the synthesizeablity is reflected by the fact that 35, 401 of 150, 175 (23.6%) of filtered and embedded recombinants exhibit an SAScore lower than the 75th percentile of the USPTO-10k.
For the application in genetic algorithms, it of interest to analyze the efficiency of the presented method. Therefore, 150 randomly chosen pairs of unique parent molecules from the USPTO-10k dataset were subjected to the cut-and-join operation. Subsequently, offsprings were SA-scored and compared against the nth percentile of the reference dataset. Two pairs needed to be replaced, since the cut-and-join crossover works on the kekulized molecular graph, which timed out using rdkit (Fig. 12).Fig. 12. Frequencies of molecules with a better SAscore than the nth percentile of USPTO-10k molecules are depicted. A significant fraction of the recombinant offsprings has good SAscores, and thus it is feasible in each step of the GA to reject candidate offsprings that are difficult to produce according to their SA score
It is feasible without dramatic performance loss, therefore, to reject candidates with large scores. Moreover, it may also be a viable strategy to only reject candidates derived from longer lineages of parents and grandparents with poor synthesizability scores, while allowing some search intermediates with unfavorable score to encourage the diversity (Fig. 13).Fig. 13. Molecular weight (MW) distribution of the 10 molecule sample set. The cut-and-join crossover operator shows an increase of molecular weight. The \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta$$\end{document} MW average of the different sets compared to the USPTO-10k can be found in Additional file 4. For better readability 56 molecules with a MW >1100 in the USPTO-10k were removedFig. 14Distribution of ring topologies of parent and recombinant molecules. Molecules are assigned a category based on the priority: heteroaromatic > aromatic > heterocyclic > carbocyclic > acyclic. The cut-and-join crossover operator tends to generate more polycycles, especially with heteroaromatic and aromatic rings. This is also due to the sample, which is also largely composed of heteroaromatics
We also benchmarked the recombinant sets with the help of the MOSES platform. This tool is designed to support the evaluation of generative models for molecular structures. It computes an array of metrics for comparing molecular structures and properties interesting for drug discovery and compares the generated output to the characteristics of the data used to train the model [59]. Whenever possible 30, 000 molecules were sampled. As expected, all molecules passed the validity test based on valency and bond consistency. Checks for uniqueness were skipped since the cut-and-join algorithm is implemented in a way to not save duplicates. The novelty check, comparing canonicalized SMILES, confirmed that there is little to no overlap between the input database and the recombination products.
Structural similarities of parents and recombinants
Molecule structures were further investigated by the cosine similarity in frequencies of Bemis-Murcko-Scaffolds (contracted molecule structures containing only ringsystems and linkers) [60] and BRICS-fragments (breaking of retro-synthetically interesting chemical substructures) [61].
Throughout all filter steps the cut-and-join operator gives very low Scaffold similarity of 1% to 5%, which reflects the ability to generate vastly different ring systems. In this application, the cut-and-join crossover operator leans towards the generation of polycyclic and heteroaromatic compounds as shown in Fig. 14, which is also caused by the sample being mostly made up of heteroaromatic compounds. The recombination products keep a balance of its BRICS fragments with similarities to the USPTO10k spanning from 19% to 45%.
Furthermore SNN (Similarity to a nearest neighbor) was calculated, which is the average between the Tanimoto similarity of ECFP fingerprints and its nearest neighbor from the USPTO-10k set. With similarities of 43% to 51%, generated molecules are neither suspiciously distant from the USPTO-10k nor very similar. With an Internal diversity score of 51% to 82%, the recombinant sets still shows uniqueness.
Metrics for the application in drug discovery
Fréchet ChemNet Distance [62] is an evaluation metric for generative models that is particularly geared towards applications in drug design. It scores diversity and similarity in biological and chemical properties using the hidden layer of a neural network trained to predict biological activities. Products of the cut-and-join crossover tend to received rather high scores between 20.36 to 40.61 indicating the need to filter recombinants to enrich drug-like molecules. Similarly, the majority of recombinants score low in the Quantitative Estimation of Drug-likeness (QED) scale, which is based on molecular properties found in approved drugs. Nevertheless, 17% to 34% of the crossover products pass the medicinal chemistry filters (MCF) [59] and pan assay interference compounds (PAINS) filters [63].
The full table of MOSES metrics for the different recombinants can be found in Additional file 4.
In summary we observe that natural cut-and-join crossovers produce a large fraction of chemically meaningful candidates. Moreover, unrealistic candidates are easily removed using well-established filtering criteria. Even applying stringent filters for drug-likeness, a significant fraction of the crossover products are retained. It is important to note that in the benchmarking experiment reported here we only tested the suitability of the crossover procedure for chemical space exploration. It is perfectly acceptable in this setting that on average the offsprings are somewhat more difficult to synthesize and a moderate fraction does not conform to viable molecules at all. For successful applications of GAs to chemical structure search it is sufficient that in each step a significant fraction of good candidates are produced and that bad candidates can be rejected efficiently. Cut-and-join crossover for chemical graphs, i.e., structural formulae, clearly satisfies these criteria.
Application to evolutionary optimization in computer-aided drug discovery
Evolutionary algorithms are typically very successful at generating and optimizing molecules, they share a common drawback—synthesizability [64–70]. To overcome this limitation RosettaEvolutionaryLigand (REvoLd) [71] was developed to exploit large chemical combinatorial libraries to increase synthetic accessibility and to reduce at the same time the number of dockings required to discover highly potent molecules. Despite very promising results, REvoLd is currently limited to predefined chemical spaces defined by make-on-demand libraries. These specifications are typically protected by commercial interests and only accessible through non-disclosure agreements. Moreover, these spaces usually encompass quite simple molecules and include mostly chemical matter of limited interest due to their enormous size. Fig. 15. Distribution of normalized docking scores for four protein targets using either only known actives, random samples from the Enamine REAL space, REvoLd exploring the Enamine REAL space, or our custom chemical space. Dotted lines are docking scores for the co-crystallized ligands from the used protein structure. Next to each violin plot are first the highest scoring known active and second the highest scoring molecule from running REvoLd on our custom chemical space. Lid_root2 is the an interface score normalized by the squared number of non-hydrogen atoms. All scoring results except EFR+REvoLd were taken from the original REvoLd publication
To explore chemical space beyond this point and to demostrate the effectiveness of cut-and-join crossover, we used the natural crossover operator to generate and explore new combinatorial libraries from lists of known binders. We selected four of the five targets used as benchmarking targets in the original REvoLd publication [71]: the enzyme Tyrosyl and the receptors Orexin 1, Muscarinic M1, and Y1. The lists of known actives were taken from [72]. The screened libraries were generated from 292 (Tyrosyl), 234 (Orexin), 447 (Muscarinic), and 801 (Y1) known actives, yielding 481 million, 465 million, 182 million, and 316 million potential new molecules, respectively. 20 independent REvoLd runs were conducted against each target, all tested molecules reported, duplicates between runs removed, and the remaining molecules merged into a single results list per target. The results are presented in Fig. 15.
Our customized chemical space shows highly increased docking scores. REvoLd’s evolutionary optimization results in molecules exceeding the scores of previous results by far. The reported high scoring molecules show clear resemblance to the known actives used as seed molecules. In particular the resulting compound for the Orexin 1 receptor is promising, since it increased noticeably in size from known actives and thereby more closely resembles the large peptide that natively binds to the receptor. The compounds for Tyrosyl on the other hand are less convincing. They are more fragment-like and highly charged. However, the high scoring known actives used as seed molecules are small and polar as well, so this could be more a problem of the scoring function deployed for this target. We found the scoring function tends to overrate polar interaction and thus has a propensity to guide the evolutionary optimization towards even more charged compounds.
Overall, these simulation results are very promising, but prompt further investigation. The tremendously improved docking scores require rigorous testing to rule out possible docking artifacts. The case of Tyrosyl highlights target specific problems which require either a specialized scoring function or a different approach to generate the custom chemical libraries considering charges of resulting molecules. Furthermore, the synthetic accessibility needs a more thorough investigation, since this is a major bottleneck of evolutionary algorithms and generative methods in general in CADD. Finally, we noticed a ten times increased overlap within individual runs as well as between runs. This might point towards a smooth energetic landscape converging towards similar molecules, which would also partially explain the highly improved docking scores. Again, this also requires further investigation that, however, are outside the scope of this manuscript.
Discussion and conclusions
We have introduced a very general scheme for defining crossover operators for graphs that start from cuts in the parental graphs and subsequently use a generalized join operation. This construction is very flexible and makes it possible to restrict the offspring to conserving certain properties of the parents. This makes cut-and-join crossover a generally applicable class of operators to implement recombination in genetic algorithms that operate on graphs. For applications to chemical graphs, degree-preservation and the preservation of graph-theoretical planarity are of particular relevance.
Cut-and-join crossover can be performed efficiently in such a way that degree preservation or the preservation of all bond orders can be enforced even though this is in general a hard problem. Moreover, alternative cuts can be enumerated efficiently in the order of increasing size.
In contrast to crossover on string representations, cut-and-join crossover on graphs is surprisingly versatile. In particular, we have shown that every finite graph with fixed maximal degree can be constructed by a GA operating on a finite size population within a finite number of steps. Moreover, the search space of such graphs is strongly connected. Taken together, therefore, the natural or degree-preserving cut-and-join crossover operations, possibly complemented by edge-swapping as a simple mutation operator, are sufficient to give access to any molecular scaffold. The natural and degree-preserving cut-and-join crossover conceptually resembles chemical reactions that exchange (large) moieties between pairs of molecules. Such exchanges appear to be particularly prevalent in metabolic networks. Our results on reachability and connectedness of chemical space under cut-and-join crossover thus suggests that metabolic reactions might also be capable of reaching all of chemical space. It is worth noting in this context that if a natural join \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V'] \circ H[W'']$$\end{document} exists, then there is also a natural join \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$G[V'']\circ H[W']$$\end{document} , which could be interpreted as a balanced reaction of the form
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} G + H \rightarrow \left( G[V']\circ H[W'']\right) + \left( G[V'']\circ H[W']\right) \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V'\cup V''=V(G)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W'\cup W''=V(H)$$\end{document} .
Strong mathematical results on the convergence of GA require reachability together with non-trival assumptions on the selection procedure, and so far have become available only for strings [73, 74]. It will be interesting to explore whether cut-and-join crossover on graphs behaves similarly. This endeavor goes beyond the scope of the present contribution, however.
For chemical applications, crossover products that represent chemically plausible molecules are desired. Benchmarking our procedure using a wide variety of descriptors including 3D embeddability shows that a large fraction of the offspring of real-world molecular graphs satisfies these chemical constraints. Moreover, since alternative cut-and-join crossover products can be generated very efficiently, it is feasible to filter implausible graphs and/or graphs with other undesirable properties, so that GA runs can be restricted to well-formed molecules. This view is reinforced by the successful application of cut-and-join crossover to four drug-design tasks using the REvolD software. This showcase example led to predictions of candidate ligands with predicted binding constants substantially exceeding the best-known ligands. Of course, these computational candidates need follow-up with respect to their properties as drug designs, but in any case demonstrate the efficacy of cut-and-join crossover in a CADD setting.
The favorable properties of cut-and-join crossover operators suggest that such a general, mathematically motivated scheme is well applicable to chemical structure search. The method generates graphs that represent proper molecules with useful properties inherited from their parents at a sufficiently high density that there is no need for a priori restrictions of the crossovers. Instead mathematics-based generation with subsequent filtering for chemical sanity works efficiently.
Nevertheless, there are some aspects that deserve attention in future research. In particular, we have not investigated here how mutation and recombination operators interact. We expect that suitably designed mutation procedures can replace the reservoir sets of structural building blocks used as technical device in proving the reachability results. Moreover, chirality is an important feature of molecules that is not represented at the level of chemical (multi)graphs. These can thus be seen as representing the equivalence class of all stereo-isomers. In a brute-force manner, these can then be enumerated, see e.g. [75]. A more elegant approach would be to make use of annotation of stereochemical information which may be specified by structural formulas. Cis/trans isomers at double bonds may be specified in terms of the order of the substituents; the Natta notation, which uses wedges to indicate the spatial orientation in or out of the paper plane, is commonly used to describe tetrahedral chiral centers. Considering the neighborhood of a vertex as a circularly ordered instead of a plain set is sufficient to capture localized chirality [76]. It would thus be possible to handle chirality in the context of cut-and-join operations by inserting join edges into circularly ordered lists at the same places where the severed edges have been. Similarly, one could also deliberately invert chirality during the crossover operation. Moreover, the orientation of a chiral center is also a very natural mutation operation on molecular graphs with chirality annotation.
Supplementary Information
Supplementary file 1. Proofs related to the strong Connectedness of the search space \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{{\mathcal {G}}}$$\end{document} Supplementary file 2. Molecules used in the benchmarking sectionSupplementary file 3. Details on embedding violations observed for crossover productsSupplementary file 4. Summary of MOSES statistics
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Enamine (2024) Real space. URL https://enamine.net/compound-collections/real-compounds/real-space-navigator
- 2Morin L, Danelljan M, Agea MI et al (2023) Mol Grapher: Graph-based visual recognition of chemical structures. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp 19495–19504, 10.1109/iccv 51070.2023.01791, URL http://arxiv.org/abs/2308.12234, ar Xiv:2308.12234 [cs]
- 3Kerstjens A, Winter HD (2022) Leadd: Lamarckian evolutionary algorithm for de novo drug design. Journal of Cheminformatics 14:1–20. 10.1186/S 13321-022-00582-Y/FIGURES/17, URL https://jcheminf.biomedcentral.com/articles/10.1186/s 13321-022-00582-yhttp://creativecommons.org/publicdomain/zero/1.0/10.1186/s 13321-022-00582-y PMC 876075135033209 · doi ↗ · pubmed ↗
- 4Leguy J, Cauchy T, Glavatskikh M et al (2020) Evomol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation. Journal of Cheminformatics 12:1–19. 10.1186/S 13321-020-00458-Z/FIGURES/10, URL https://jcheminf.biomedcentral.com/articles/10.1186/s 13321-020-00458-z 10.1186/s 13321-020-00458-z PMC 749400033431049 · doi ↗ · pubmed ↗
- 5Meyenburg C, Dolfus U, Briem H et al (2022) Galileo: Three-dimensional searching in large combinatorial fragment spaces on the example of pharmacophores. Journal of Computer-Aided Molecular Design. 10.1007/s 10822-022-00485-y 10.1007/s 10822-022-00485-y PMC 1003233536418668 · doi ↗ · pubmed ↗
- 6Pérez-Salvador BR, de-los Cobos-Silva S, Gutíerrez-Ándrade MA, et al (2002) A reduced formula for the precise number of -matrices in . Discrete Math 256:361–372. 10.1016/S 0012-365X(97)00197-0
