Automated Structure Discovery in Atomic Force Microscopy

Benjamin Alldritt; Prokop Hapala; Niko Oinonena; Fedor Urtev; Ondrej; Krejci; Filippo Federici Canova; Juho Kannala; Fabian Schulz; Peter; Liljeroth; Adam S. Foster

arXiv:1905.10204·physics.comp-ph·February 28, 2020

Automated Structure Discovery in Atomic Force Microscopy

Benjamin Alldritt, Prokop Hapala, Niko Oinonena, Fedor Urtev, Ondrej, Krejci, Filippo Federici Canova, Juho Kannala, Fabian Schulz, Peter, Liljeroth, Adam S. Foster

PDF

TL;DR

This paper introduces a deep learning method that interprets AFM images to determine the atomic structure of molecules, enabling analysis of complex, non-planar molecules previously difficult to resolve.

Contribution

The authors develop a novel deep learning framework that directly predicts molecular structures from AFM images, expanding high-resolution AFM applications to diverse molecular systems.

Findings

01

Successfully resolved multiple adsorption configurations of 1S-camphor on Cu(111)

02

Demonstrated the ability to interpret distorted AFM images of non-planar molecules

03

Enabled direct prediction of molecular structures from AFM data

Abstract

Atomic force microscopy (AFM) with molecule-functionalized tips has emerged as the primary experimental technique for probing the atomic structure of organic molecules on surfaces. Most experiments have been limited to nearly planar aromatic molecules, due to difficulties with interpretation of highly distorted AFM images originating from non-planar molecules. Here we develop a deep learning infrastructure that matches a set of AFM images with a unique descriptor characterizing the molecular configuration, allowing us to predict the molecular structure directly. We apply this methodology to resolve several distinct adsorption configurations of 1S-camphor on Cu(111) based on low-temperature AFM measurements. This approach will open the door to apply high-resolution AFM to a large variety of systems for which routine atomic and chemical structural resolution on the level of individual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Automated Structure Discovery in Atomic Force Microscopy

Benjamin Alldritt,1∗ Prokop Hapala,1∗ Niko Oinonen,1∗

Fedor Urtev,1,2∗ Ondrej Krejci,1 Filippo Federici Canova,1,3

Juho Kannala,2 Fabian Schulz,1† Peter Liljeroth,1‡ Adam S. Foster1,4,5‡

1Department of Applied Physics, Aalto University, 00076 Aalto, Espoo, Finland

2Department of Computer Science, Aalto University, 00076 Aalto, Espoo, Finland

3Nanolayers Research Computing Ltd, London, UK

4Graduate School Materials Science in Mainz, Staudinger Weg 9, 55128, Germany

5WPI Nano Life Science Institute (WPI-NanoLSI), Kanazawa University,

Kakuma-machi, Kanazawa 920-1192, Japan

∗These authors contributed equally.

†Present address: IBM Research–-Zurich,

Säumerstrasse 4, 8803 Rüschlikon, Switzerland

‡To whom correspondence should be addressed;

E-mail: [email protected]; [email protected]

Atomic force microscopy (AFM) with molecule-functionalized tips has emerged as the primary experimental technique for probing the atomic structure of organic molecules on surfaces. Most experiments have been limited to nearly planar aromatic molecules, due to difficulties with interpretation of highly distorted AFM images originating from non-planar molecules. Here we develop a deep learning infrastructure that matches a set of AFM images with a unique descriptor characterizing the molecular configuration, allowing us to predict the molecular structure directly. We apply this methodology to resolve several distinct adsorption configurations of 1S-camphor on Cu(111) based on low-temperature AFM measurements. This approach will open the door to apply high-resolution AFM to a large variety of systems for which routine atomic and chemical structural resolution on the level of individual objects/molecules would be a major breakthrough.

Introduction

Scanning Probe Microscopy (SPM) has been the engine of characterization in nanoscale systems (?). Atomic Force Microscopy (AFM) (?) in particular has developed into a leading technique for high-resolution studies without material restrictions (?, ?, ?). It is increasingly being used for detailed characterization in a wide variety of physical, biological and chemical processes (?, ?). Pioneering experimental studies are now providing atomic scale insight into, for example, friction, catalytic reactions, electron transport and optical response. In general for AFM, the tip itself has often been the barrier to translating atomic resolution into physical understanding, with many images and processes ultimately being identified as a convolution with the tip structure (?, ?). While many partially successful efforts in tip functionalization were attempted in the last decade, the use of a CO molecule attached to a metal tip in low-temperature ultra-high vacuum AFM (CO-AFM) measurements (?, ?) has offered a path to reliable, outstanding resolution. The use of a relatively inert tip, with respect to the molecule-substrate interaction (?), means that it can approach very close to the object of interest without excessive attractive forces resulting in unintentional lateral manipulation of the target molecule. This allows the interaction to be dominated by extremely short-ranged Pauli repulsion between atoms in the sample and at the tip apex, providing the very high resolution essential to the technique. In particular, CO-AFM now offers an unprecedented window into molecular structure on surfaces – aside from the detailed resolution of the results of molecular assembly (?, ?), it is possible to study bond order (?), charge distributions (?, ?) and the individual steps of on-surface chemical reactions (?, ?, ?, ?).

As yet, most CO-AFM studies have been focused on planar molecular systems, where the experimental image requires almost no interpretation (?, ?, ?). Even where understanding is not immediately obvious, such as due to controversies over the nature of observed bonds (?), efficient models have been developed (?, ?, ?, ?, ?) that explain the contrast mechanism in terms of the tip-surface interaction and CO lateral flexibility. However, the further the systems studied are from two-dimensional molecules containing only hydrogen and carbon, the more complex and time consuming (if not impossible) the interpretation process becomes (?, ?, ?, ?, ?). While recent measurements using rigid O-terminated copper tips makes interpreting images of flat systems even easier (?, ?), the rigidity also means even less atoms can be characterized when moving to 3D systems - the flexibility of CO allows it to sample molecular ”edges” in more detail. In recent years, CO-AFM has moved towards measuring truly unknown structures (?, ?, ?, ?), where it has overcome many of the limitations of techniques such as nuclear magnetic resonance and mass spectrometry. It is clear that this trend is going to continue, and potentially even accelerate, in particular for innovative studies, e.g. in life sciences or biochemistry (?, ?), demonstrated manifestly in the first CO-AFM images of DNA (?). Reliable interpretation of such data becomes a vast exploration through all possible molecules, configurations and imaging parameters to find agreement. This is impractical in anything beyond very simple systems, severely limiting the ultimate power of the technique.

In this work, we couple a systematic software approach with detailed experimental CO-AFM imaging to understand and predict AFM images for molecules of any size, configuration or orientation without prior knowledge of the system being studied. We use the latest modelling approaches to efficiently synthesize 3D AFM data (?) from 134 000 isolated molecules. These were scanned from representative directions to establish physical descriptors that characterize a series of slices through the data in a given direction. For a given series of experimental images, we then apply a deep learning infrastructure (?, ?, ?, ?) to find a descriptor match, and predict the molecular structure directly. The method is validated by comparison to a systematic CO-AFM experimental study of orientations of camphor molecules on a copper surface. This Automated Structure Discovery Atomic Force Microscopy (ASD-AFM) approach will open the door to apply high-resolution AFM to a huge variety of systems for which routine atomic and chemical structural resolution on the level of individual objects/molecules would be a major breakthrough.

Results

The measured signal in CO-AFM is the shift of the resonance frequency of the cantilever ( $\Delta f$ ), which is due to the sum of all conceivable tip-sample interactions. In CO-AFM, the $\Delta f$ signal is, to a large extent, determined by the interaction of oxygen in the CO molecule and the closest atoms of the sample directly under the tip. Nevertheless, due to the lateral flexibility of the CO, the image contrast is not related to the atomic positions in a trivial fashion. We will describe a methodology that aims to invert this imaging process and yield the atomic coordinates directly from a set of measured (or simulated) $\Delta f$ data. Briefly, this involves developing an image descriptor, i.e. a 2D representation of molecular structure, that encodes the positions of the atoms in the object molecule – this can be calculated directly if the positions are known. We train a neural network to reproduce this image descriptor directly from the $\Delta f$ data using simulated AFM images and then verify this approach using simulated images from molecules not included in the training data. Finally, we will employ experimental AFM images as a final test of the proposed methodology.

Inverse imaging problem

Reconstruction of molecular structures from AFM images can be seen as the search for an inverse function ( $\Phi^{-1}$ ) to the imaging process $\Phi:(\vec{R},Z)\rightarrow{\Delta f(\vec{r})}$ , where $\vec{R},Z$ are positions and atomic number of nuclei, and $\Delta f(\vec{r})$ is the value of measured frequency shift in each point of space $\vec{r}$ (see Fig. 1). Analysis and understanding of the imaging process $\Phi$ are therefore crucial for obtaining ( $\Phi^{-1}$ ). In particular, it is important to estimate how well conditioned the inverse operation is, and to identify which information is preserved or where information is lost.

The imaging process can be decomposed into the following sequence of operations:

Atoms of the sample generate various force fields in the space around them (e.g. electrostatic, van der Waals, Pauli repulsion). Many methods ranging from empirical potentials (e.g. (?)) to ab initio calculations (e.g. (?)) were applied in the past to approximate those force fields. 2. 2.

The tip apex (e.g. CO molecule) relaxes under the influence of those force fields as it approaches toward the sample (see Fig. 1B). This means that the force fields are sampled in distorted (relaxed) coordinates (Fig. 1C). These distortions are crucial for understanding features in AFM images. The process can be simulated by a simple mechanical model (e.g. probe particle (PP) model (?, ?)). 3. 3.

Forces felt by the relaxed probe particle are integrated over its path (Fig. 1C) and this causes changes in the measured oscillation frequency (Fig. 1D). The change of frequency $\Delta f$ can be therefore calculated using a simple formula (?).

Furthermore, from previous simulations of the AFM imaging process (?, ?, ?, ?, ?), it is clear that images are extremely sensitive to even minor variations of height (z-coordinate) of the topmost atoms, and conversely very insensitive to atoms $>$ 0.5 Å below this. Also, the chemical identity of the atom cannot be easily determined from observed contrast as it depends on the $z$ -coordinate, the chemical neighborhood and orbital structure (e.g. nitrogen can appear both as a depression and a protrusion in carbonaceous aromatic systems). Instead, the characteristic topology of interatomic potentials (saddle ridges between nearby atoms, vertexes between those ridges, contrast inversion) can be clearly determined from AFM data as a fingerprint of typical chemical groups or bonding configurations. The electrostatic force has a rather small contribution to vertical force in contact, but often considerably distorts the image laterally (?, ?).

Overall, the imaging process ( $\Phi$ ; Fig. 1A-D) is a complex and highly non-linear function, and its inversion ( $\Phi^{-1}$ ) cannot be easily expressed by any analytic equation or practical numerical algorithm. Hence, we employ a neural network (NN) (Fig. 1F) as an efficient universal fitting scheme to learn an approximation to $\Phi^{-1}$ from example atomic structures and corresponding 3D AFM data stacks (a stack is a set of constant height images at different vertical positions; Fig. 1e). The image-like structure of input AFM data calls for the use of a deep convolutional neural network (CNN) (?), optimized for machine learning (ML) of regular 3D grids.

Generation of training data

The main problem in training deep convolutional networks is to provide sufficient labeled training data (from thousands to millions of input-output pairs). High-resolution AFM experiments are time intensive, requiring several hours to acquire a single 3D data stack, which would render direct training on experimental data impractical. In addition, experimental data are a priori unlabeled (i.e. we do not know the correct interpretation) and interpretation of 3D features in AFM data is currently a difficult task, even for human experts. Hence, human labeling cannot provide us with reliable labels.

Therefore, the only feasible option is to train a model on simulated data, where correct interpretation (labels) are known a priori. For our reference simulations, the geometries of sample molecules were taken from a well-known database of 134 000 isolated small organics (?), structurally optimised with Density Functional Theory (DFT).

Our methodology employs a new, highly efficient graphical processing unit (GPU) implementation of the PP model (?, ?), which allows the generation of $\sim$ 50 input-output pairs (i.e. 3D AFM data-stacks and 2D image representation of structure) per second. This implementation is performance optimized, allowing for rapid experimentation with new settings and CNN architectures, while simultaneously generating data on-the-fly. This eliminates issues related to the storage of terabytes of training data otherwise needed. For each molecule, we first calculate the force field sampled on a regular 3D grid (this step takes $\sim$ 0.1s on a desktop computer), and then this force field can be rapidly interpolated to generate simulated constant-height $\Delta f$ images from 10-20 orientations of a given molecule (dependent on molecule symmetries) each of which takes $\sim$ 0.02s. These orientations are initially uniformly distributed over a sphere, but we then weight the final selection to orientations which expose more atoms to the tip. This avoids images where just a single atom is visible and increases the information available per stack in the training process. Here, and in general, the $z$ -coordinate is defined as the distance from the carbon in the CO-tip apex to the atom closest to the tip in a particular molecular orientation. Each scan starts at $z=8.0$ Å and continues 3.0 Å toward the molecule in steps of 0.1 Å. These 30 slices of vertical force are transformed into 20 slices of frequency shift (2.0 Å of valid data) using the Giessibl formula (?), forming a stack from simulated data. Optimization of this choice of $z$ -window is possible for a given experiment, but this selection provided the best performance for the results presented here.

Image descriptors

In general when trying to predict molecular geometries from AFM images, while it may seem most obvious to directly convert an image stack to a set of $xyz$ coordinates, this is not an efficient descriptor in a CNN model (see expanded discussion in Supplementary Material, SM). Hence, we opt to represent the output geometry in an image-like form that is directly related to the atomic coordinates. The selection of this 2D image descriptor is critical to an efficient model and must be chosen such that it can be realistically and reliably determined from AFM data. The descriptor can be considered as the language with which we wish to analyze the problem and the choice of language is enforced by the reference database - during the generation of the simulated image database we also calculate 2D image descriptors for all molecules and orientations.

Then we ask the CNN to translate the data stacks into this language. It achieves this by extracting features in a given $\Delta{f}$ slice as a function of their character and position. It does this simultaneously for all given $\Delta{f}$ slices in a data stack - features which appear in multiple slices are much more likely to be identified as important. As the deep CNN moves through its multiple layers (Fig. 1f), it filters these features according to the chosen biases and weights (manually optimized in this work, see SM), ultimately identifying a critical feature map. The CNN then begins the second half of its job, building a 2D image descriptor from this feature map. Using the reference database for that descriptor, it makes a prediction of the best match for a given feature.

We designed several physically meaningful representations of molecular structure on a grid, with specifics of AFM microscopy in mind (see discussion in SM). In all cases, we represent the data as a single 2D image with the same lateral resolution as the input AFM data, which simplifies the computational analysis and allows for quick validation via human users. For the rest of the discussion, we use the vdW-Spheres representation – an intuitive representation of molecular structure by their van der Waals radii, commonly used in chemical visualization programs. For each molecule and orientation, we calculate the vdW-Spheres descriptor from the reference database as follows: we calculate the van der Waals radius of all atoms and then plot this in 2D using a $z$ -range starting from position of the highest atom to 1.5 Å below it, i.e. contributions below this are ignored. The relative height of atoms in this window is represented by the their brightness in the 2D image descriptor.

Geometry prediction from simulated AFM data

In order to benchmark the methodology, we employed the trained CNN model to predict the geometry of several molecules that were not included in the training set. The internal quality of the model can be judged by how well the predicted 2D image descriptor (derived from the simulated AFM 3D image stack) matches the reference descriptor calculated directly from the molecular geometry. In the first example (Fig. 2A-F), we picked a molecule (an isomer of C7H10O2) that has a functional group and a non-planar geometry as representative of the types of molecule we wish to identify. The prediction qualitatively matches the reference, capturing all the key atoms except the hydrogen of the hydroxyl group, which is present in the analytically computed reference image representation. It is very difficult to identify the lower lying atoms from the AFM images. For the molecule shown in Fig. 2A-F, it would not be possible for a human expert to identify the hydrogen atom of the hydroxyl group. The goal of the introduced ideal image representation, i.e. vdW-Spheres representation, is to train a CNN to extract as much as possible structural information presented in an individual AFM stack of data and store it in compressed readable format.

As another example, we consider a dibenzo[a,h]thianthrene molecule, which has been previously experimentaly studied (?) (Fig. 2G-L). The CNN is again able to predict most of molecular features in the vdW-Spheres representation, in particular, identifying the two dominant sulphur atoms. The remaining atoms of the aromatic system are also predicted, but they are not as well separated as in the reference. CNN-predicted properties are typically blurred and this is somewhat dependent on the choice of 2D image descriptor (see Fig. S3g).

The last example is a fullerene C60 molecule oriented with a pentagon upwards. We performed a prediction of the vdW-Spheres representation based both on simulation (Fig. 2M-R) and newly measured experimental data (Fig. 2S-V). The pentagons are oriented slightly in an asymmetric manner with 3 carbon atoms up. The main features, i.e. 8 top-most atoms, are reproduced rather well in the CNN prediction, while the remaining atoms remain invisible. This is true for both simulated and experimental images. In the experimental image, however, are visible artifacts originating from dark attractive areas of C60, which are not visible in the simulated image. This is a clear indication that the simulation does not reproduce this particular experiment sufficiently well. Despite this fact, the CNN prediction is robust enough to consistently render the top-most atoms. More examples from our training set can be found in Fig. S4.

To illustrate how our method can aid in discrimination of unknown molecules and separate chemical information and physical topography, we compare 3 different derivatives of antraquinone with a different number of chlorine atoms in Fig. 3. In this illustrative example, the molecules are tilted so that the bottom edge is higher than the upper edge, making this a 3D problem with a peculiar image contrast over the edge that can hardly be deciphered by an expert. Although each molecule provides clearly distinct AFM images, it is rather difficult to rationalize the differences in terms of atomic structure. In fact any similarity between molecules in the the 1st and 2nd row is hardly visible from the AFM pictures. In contrast, the predicted vdW-Spheres map clearly shows a change in atomic radius in one or two atomic sites while the rest of the molecular structure is preserved. While disentangling the atomic type from its $z$ -position is difficult based on the vdW-Spheres image description, the different atomic types should result in a different decay of the $\Delta f$ contrast as a function of the tip-sample distance. Hence, it should be possible to differentiate atomic species. In particular, a modified CNN (shown in Fig. 3 as column type map) learned to discriminate small peripheral atoms (hydrogen, red) from larger peripheral atoms (chlorine, oxygen, green), leaving aside rather indiscriminate carbon backbone (blue). The network clearly identified substitution of a hydrogen atom by chlorine. While showing the potential of the technique in terms of recognition, the prediction is not yet fully reliable, as can be see from misidentified oxygen as small (red) in the second row.

Geometry prediction from experimental AFM data

The true validation of our ML approach is to make predictions directly from experimental AFM images. Ultimately, this would be done from images of an unknown system, but as a benchmark for our first iteration of the method, we apply it to find molecular configurations of a known molecule. Here we selected 1S-camphor as the target molecule due to its 3D geometry and potential for adopting multiple distinct adsorption geometries on a Cu(111) surface. Combined STM and AFM imaging allowed us to distinguish 8 characteristic adsorption geometries with reproducible data in each case. Further analysis reduced this to a set of 5 distinct configurations clean enough for good comparison and we acquired a set of constant-height $\Delta f$ images in each case (see SM for details). Even highly trained experts were not able to decipher the molecular structure from these images, and they provided an excellent challenge and example for the CNN model. The 3D experimental image stack (Fig. 4A-C) is fed into the CNN model and a 2D image descriptor (vdW-Spheres) is predicted based on this data (Fig. 4D). This experimental descriptor is then compared via cross-correlation to a set of descriptors calculated directly from atomic coordinates taken from a set of uniformly distributed molecular rotations (Fig. 4E). The best fit gives us a prediction of the molecular configuration corresponding to the original descriptor from experimental data (Fig. 4F). Qualitatively, the match between experimental and simulated descriptors is very good, reproducing the performance seen with purely simulated data (Fig. 2). In order to explore the plausibility of the predicted geometries, we now reverse the inverse imaging process and consider the predicted simulated images for the best fit descriptor (Fig. 4G-I). In all cases the simulated images qualitatively capture the main features seen in the experimental images. In cases 1-4, agreement is generally good at all heights, but the simulated image tends to be somewhat sharper than the experiments at close approach. For case 5, the core of the simulated image is representative of experiments, but some of the extended features are clearly absent. Furthermore, note that experimental image 5a in Fig. 4 shows no atomic features (the interactions are purely attractive), whereas the simulated image 5G clearly does (showing the onset of repulsive short-range interactions). This is because the CNN was consciously trained only on data containing atomic-like features, as those are critical for identification, and not the kind of large tip-sample distance used in 5A.

Discussion

The aim of this work was to establish a reliable and rapid method for solving a problem that expert humans cannot - the interpretation of high-resolution AFM images of complex 3D molecules. We have demonstrated that our ML method based on a CNN architecture can solve this problem with trivial computational effort. In its current form, the model can, e.g., identify adsorption configurations accurately. On a complex system, this allows us to drastically reduce the number of possible molecular solutions from a set of experimental images.

However, we believe this is only the first step in a developing analysis field and it is clear that several further problems need to be tackled if we wish to increase prediction accuracy even further. Simple improvements include introducing a bigger variety of atoms into the training set (with a very large initial computational cost), and the creation of an integral model that can predict multiple 2D image representations simultaneously, improving model robustness for features recognition. In the medium term, while our current approach using the PP method (i.e. re-using a precalculated force-field grid for scans from multiple directions) is highly efficient, it prevents a simple implementation of more sophisticated non-spherical electrostatics (e.g. quadrupoles) that have been shown to be important for CO tip simulations in certain systems (?, ?). While we consider this limitation of the underlying simulation model a secondary issue in the development of a reliable ML architecture, we have already begun exploring efficient solvers for more sophisticated models based on the electron density from DFT (?). A more pressing concern for accuracy in simulated images is the role of surface- and tip-induced molecular displacements. For the latter, this has generally been ignored in previous simulations of CO-tip AFM experiments, and fixed geometries are considered throughout. In this work, we considered how molecular tilting and functional group rotations affected the predicted images (see SM Sec. 3). It is clear that these can change the predicted simulated images, particularly at close approach and finding a systematic way to include these in the matching process could significantly improve accuracy. We also considered the possible changes of molecular configurations when adsorbed on the surface (see SM Sec. 2), but any errors seen were not in the predictions of CNN model and improvements would require advances beyond the standard methods used to obtain accurate adsorption structures - a separate research field.

Finally, the nature of the AFM measurement itself causes a particular difficulty in the uniqueness of the molecular solutions. For certain configurations, common in small non-planar molecules, AFM data may provide information only about a very limited number of atoms and this may lead to several molecular solutions being almost equivalent in the quality of best fit to experiments (see SM Sec. 3). In systems where this is a problem, considering several experimental configurations of the same molecule, as done here, makes identification significantly easier. More generally, we are looking at including multiple channels of information for a single configuration by using an image descriptor incorporating tip-dependent electrostatic information available via other tip terminations (?, ?, ?). This could be also be extended to incorporate simultaneous fitting to Kelvin Probe Force Microscopy data (?, ?, ?, ?), further improving the uniqueness of predictions.

Despite these challenges, the approach is immediately applicable to a wide variety of complex molecular systems where conventional interpretation approaches have either failed or cannot even be attempted. As such, it promises the availability of atomic and chemical structural resolution in systems where it offers the prospect of major impact.

Materials and Methods

Machine-Learning model architecture

The architecture of our CNN is similar to the encoder-decoder type networks that have been used in, for example, image segmentation (?). At the input side it comprises 3 layers of 3D convolutional filters ( $3\times 3\times 3$ ) interleaved by average pooling ( $2\times 2\times 2$ ), which reduces the size of the input image by a factor of 8 in $x,y$ dimensions. This information bottleneck is motivated by the fact that input AFM images are mostly rather smooth and carry a limited amount of information (i.e. just position and size of a few atoms). Down-sampling also helps to facilitate long-range correlations in the image using only local and cheap $3\times 3\times 3$ filters. This should help to recognize larger features such as atoms and bonds spanning over tens of pixels. The data is collapsed in the z-direction from 3D to 2D by the action of the pooling layers, while gradually being expanded to several independent channels ( $2\times$ channels by each layer). Therefore, the features obtained after this operation should encode varying z-dependence of the frequency shift. The signal is further processed by 3 layers of purely convolutional filters operating independently on each of 64 channels of the 2D image. In the last part of the CNN architecture, the image is expanded back to original resolution ( $8\times$ in each dimension) by 3 bi-layers of 2D convolution interleaved by NN-upsample operations. The final convolution is followed by a rectified linear unit (ReLU (?)) activation, which basically cuts the negative part of activations from the convolution layer, leaving ’unchanged’ positive values. Other convolutions are followed by LeakyReLU activations with a factor of $0.1$ on the negative side, so as not to completely block learning when values are under 0 (they are leaked through). The model is implemented in Keras (?) running a TensorFlow (?) backend. Optimization of kernel sizes in the convolutional layer has not been systematically tested, but for our image recognition network, small kernel sizes with additional layers have been quite effective.

The structure was motivated by the idea that the central part - i.e. the $8\times$ down-sampled representation with 64 channels - will learn to represent AFM images in terms of abstract, physically meaningful features (e.g. slope of frequency shift curve, blobs representing atoms, characteristic sharp-line features between nearby atoms). Various physical properties, such as height maps or positions of atoms in the second up-sampling stage, can then be identified from this internal abstract representation.

In order to make the model more robust to experimental artifacts and limitations we add $5\%$ white noise (representing electronic noise in the measurements) and random rectangular cutouts (?) (representing sudden jumps in the measurements) to the simulation data. Note that this also aids in avoiding problems in relation to the ill-posed nature of the force-frequency shift conversion (?, ?).

Molecular database

The original structures of the molecules in the database were optimized with DFT at the B3LYP/6-31G level (?). Using the quantum chemistry software Psi4 (?, ?), we performed single-point coupled-cluster calculations (singles and doubles, cc-pvdz basis) for all the 134k molecules, thus obtaining charge densities and Mulliken populations necessary to operate the Probe-Particle simulator.

Experimental Methods

Polished Cu(111) and Au(111) single-crystals (Mateck/Germany) were prepared by repeated Ne+ sputtering (0.75 keV, 15 mA, 20 min) and annealing (850-900 K, 5 min) cycles. Surface cleanliness and structure was verified by scanning tunneling microscopy (STM). Sample temperatures during annealing were measured with a pyrometer (SensorTherm Metis MI16). 1S-camphor (Sigma-Aldrich, purity $>$ 98.5 $\%$ ) was introduced into the vacuum system via a leak valve and deposited onto the Cu(111) surface at a low-temperature ( $T=20$ K) to increase the number of distinct adsorption configurations and to achieve individual molecules rather than clusters on the surface. Fullerene C60 (Sigma-Aldrich, purity $>$ 99.9 $\%$ ) was sublimed onto a Au(111) substrate held at $\sim 200$ K.

The STM and CO-AFM images were taken with a Createc LT-STM/AFM with a commercial qPlus sensor with a Pt/Ir tip, operating at approximately $T=5$ K in UHV at a pressure of $1\times 10^{-10}$ mbar. The quartz cantilever (qPlus sensor) had a resonance frequency of $f_{0}=29939$ Hz, a quality factor $Q=101099$ , and was operating with an oscillation amplitude $A=50$ pm. Tip conditioning was performed by repeatedly bringing the tip into contact with the copper surface and applying bias pulses until the necessary STM resolution was achieved. The tip apex was functionalized with a CO molecule (?) before AFM measurements. The STM images were recorded in constant-current mode, while the AFM operated in constant-height mode. Raw data was used as input for the machine learning infrastructure. In order to minimize experimental artefacts that would cause problems with interpretation, we have implemented the following measures: Checking the background $\Delta f$ before CO pickup (smaller value indicates sharper overall tip); scanning another CO to ensure the symmetry of the CO tip after tip passivation and prior to further AFM imaging; and confirming that the excitation (dissipation) signal remains flat/featureless during the AFM measurements.

Supplementary Material

Accompanies this paper at http://www.scienceadvances.org/.

Section S1. Image representations of output molecular structure

Section S2. Matching experiment to relaxed on-surface simulated configurations

Section S3. Effect of small perturbations on AFM imaging and matching

Section S4. Neural network architecture

Section S5. Probe Particle simulations

Figure S1. Different 2D image representations of a C7H10O2 molecule from the training set

Figure S2. Different 2D image representations of a C60 molecule

Figure S3. Different 2D image representations of a Dibenzo[a,h]thianthrene molecule

Figure S4. Molecules from the validation data set together with the vdW-Spheres representation predicted by the CNN

Figure S5. Matching between simulated relaxed configurations of 1S-Camphor and experiment

Figure S6. Effect of tilt of molecules on simulated AFM images

Figure S7. Adjustment of simulated configuration by -CH3 group rotations

Figure S8. Matching experimental configuration 2 of 1S-Camphor with closest simulated configurations

Figure S9. Illustration of the layers of the CNN model

Figure S10. The mean squared loss for height maps, vdW-Spheres and atomic disks

References (68-81)

Bibliography67

The reference list from the paper itself. Each links out to its DOI / PubMed record.

11. J. Loos, The Art of SPM: Scanning Probe Microscopy in Materials Science. Adv. Mat. 17 , 1821–1833 (2005).
22. G. Binnig, C. Quate, C. Gerber, Atomic Force Microscope. Phys. Rev. Lett. 56 , 930–933 (1986).
33. F. Giessibl, Advances in atomic force microscopy. Rev. Mod. Phys. 75 , 949–983 (2003).
44. S. Morita, F. J. Giessibl, E. Meyer, R. Wiesendanger, eds., Noncontact Atomic Force Microscopy , Nano Science and Technology (Springer International Publishing, Cham, 2015).
55. N. Pavliček, L. Gross, Generation, manipulation and characterization of molecules by atomic force microscopy. Nat. Rev. Chem. 1 , 0005 (2017).
66. D. J. Mueller, Y. F. Dufrene, Atomic force microscopy as a multifunctional molecular toolbox in nanobiotechnology. Nat. Nanotech. 3 , 261–269 (2008).
77. Y. F. Dufrene, T. Ando, R. Garcia, D. Alsteens, D. Martínez-Martín, A. Engel, C. Gerber, D. J. Müller, Imaging modes of atomic force microscopy for application in molecular and cell biology. Nat. Nanotech. 12 , 295–307 (2017).
88. W. Hofer, A. Foster, A. Shluger, Theories of scanning probe microscopes at the atomic scale. Rev. Mod. Phys. 75 , 1287–1331 (2003).