Machine Learning Microscopic Form of Nematic Order in twisted double-bilayer graphene
Jo\~ao Augusto Sobral, Stefan Obernauer, Simon Turkel, Abhay N., Pasupathy, Mathias S. Scheurer

TL;DR
This paper demonstrates how convolutional neural networks can analyze STM data to learn microscopic nematic order parameters in twisted double-bilayer graphene, distinguishing them from heterostrain effects.
Contribution
It introduces a CNN-based approach to extract microscopic nematic order from STM data in moiré superlattices, accounting for correlations and heterostrain.
Findings
CNN can learn microscopic nematic order parameters.
Including energy correlations improves model accuracy.
Neural networks distinguish nematic order from heterostrain.
Abstract
Modern scanning probe techniques, like scanning tunneling microscopy (STM), provide access to a large amount of data encoding the underlying physics of quantum matter. In this work, we analyze how convolutional neural networks (CNN) can be employed to learn effective theoretical models from STM data on correlated moir\'e superlattices. These engineered systems are particularly well suited for this task as their enhanced lattice constant provides unprecedented access to intra-unit-cell physics and their tunability allows for high-dimensional data sets within a single sample. Using electronic nematic order in twisted double-bilayer graphene (TDBG) as an example, we show that including correlations between the local density of states (LDOS) at different energies allows CNNs not only to learn the microscopic nematic order parameter, but also to distinguish it from heterostrain. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectronic and Structural Properties of Oxides · Semiconductor Quantum Structures and Devices · Quantum and electron transport phenomena
Machine Learning Microscopic Form of Nematic Order
in twisted double-bilayer graphene
João Augusto Sobral
Institute for Theoretical Physics, University of Innsbruck, Innsbruck A-6020, Austria
Stefan Obernauer
Institute for Theoretical Physics, University of Innsbruck, Innsbruck A-6020, Austria
Simon Turkel
Department of Physics, Columbia University, New York, New York 10027, USA
Abhay N. Pasupathy
Department of Physics, Columbia University, New York, New York 10027, USA
Condensed Matter Physics and Materials Science Division, Brookhaven National Laboratory, Upton, New York 11973, USA
Mathias S. Scheurer
Institute for Theoretical Physics, University of Innsbruck, Innsbruck A-6020, Austria
Abstract
Modern scanning probe techniques, like scanning tunneling microscopy (STM), provide access to a large amount of data encoding the underlying physics of quantum matter. In this work, we analyze how convolutional neural networks (CNN) can be employed to learn effective theoretical models from STM data on correlated moiré superlattices. These engineered systems are particularly well suited for this task as their enhanced lattice constant provides unprecedented access to intra-unit-cell physics and their tunability allows for high-dimensional data sets within a single sample. Using electronic nematic order in twisted double-bilayer graphene (TDBG) as an example, we show that including correlations between the local density of states (LDOS) at different energies allows CNNs not only to learn the microscopic nematic order parameter, but also to distinguish it from heterostrain. These results demonstrate that neural networks constitute a powerful methodology for investigating the microscopic details of correlated phenomena in moiré systems and beyond.
I Introduction
Driven by the impressive improvements in machine learning (ML) in the last couple of years, exploring its potential for quantum many-body physics has recently become subject of intense research Carleo et al. (2019); Dawid et al. (2022). For instance, ML provides powerful tools to solve inverse problems that occur frequently in physics Lee et al. (2023); Dubois et al. (2022); Berthusen et al. (2021); Chertkov and Clark (2018): given a model, it is often straightforward with conventional many-body techniques to compute observables that can be measured experimentally, whereas the often needed inverse problem of extracting the model and underlying microscopic physics from observations is much more challenging and typically even formally ill-defined. A second example of a large class of applications of ML in physics is ML-assisted analysis of experiments, in particular of those yielding image-like data like scanning tunneling microscopy (STM) Choudhary et al. (2021); Joucken et al. (2022); Wang et al. (2020); Zhang et al. (2019), photoemission Liu et al. (2022a), and others Khan et al. (2023); Chen et al. (2022); Ede (2021); Iwasawa et al. (2022); Burzawa et al. (2019); Basak et al. (2022).
In the context of applying ML algorithms to data from imaging techniques like STM, van der Waals moiré superlattices Balents et al. (2020); Andrei and MacDonald (2020) are particularly promising for three reasons: (i) they display a huge variety of correlated quantum-many-body phenomena, such as interaction-induced insulating phases Cao et al. (2018a), magnetism Sharpe et al. (2019), superconductivity Cao et al. (2018b), electronic nematic order Kerelsky et al. (2019); Jiang et al. (2019); Choi et al. (2019); Cao et al. (2021), which can also coexist microscopically Cao et al. (2021); Lin et al. (2022). Despite intense research on these phenomena over several decades, e.g., in the pnictides or cuprates, their origin and relations are still subject of ongoing debates. However, compared to these microscopic crystalline quantum materials, moiré superlattices are (ii) highly tunable; for instance, the density of carriers can be varied within a single sample just by applying a gate voltage (as opposed to chemical doping) and even the interactions can be tuned Liu et al. (2021). This allows producing large data sets of measurements on a single sample, containing a lot of information on the microscopic physics. This aspect, which is crucial for data-driven approaches, is further enhanced by (iii) the large moiré unit cells of these systems compared to that of microscopic crystals, increasing the relative spatial resolution of scanning-probe techniques significantly. This enables experiments to probe the structure of the wave functions within the unit cell, and thus provides unprecedented access to the microscopic physics compared to conventional quantum materials. For instance, in the extreme limit of only one degree of freedom (Wannier state or pixel) per unit cell, the broken rotational symmetry of the electron liquid—the defining property of electronic nematic order Fradkin et al. (2010); Fernandes et al. (2014)—is not visible as a consequence of translational symmetry and thus requires a careful analysis of the behavior around impurities Goetz et al. (2020).
In this work, we explore these advantages of moiré superlattices for extracting or ‘learning’ effective field-theoretical descriptions of their correlated many-body physics from STM data. This can be viewed as an inverse problem and is also conceptually related to the goal of ‘Hamiltonian learning’ in quantum simulation Granade et al. (2012); Wiebe et al. (2014); Wang et al. (2017); Valenti et al. (2019); Kokail et al. (2021); Yu et al. (2022), albeit in rather different regimes and based on different measurement schemes. As a concrete example, we use electronic nematic order in twisted double-bilayer graphene (TDBG) Cao et al. (2020); Liu et al. (2020); Shen et al. (2020); Rubio-Verdú et al. (2022); He et al. (2021); Kuiri et al. (2022); Su et al. (2022). This moiré system consists of two AB-stacked bilayers of graphene that are twisted against each other; as one can see in Fig. 1(a), it exhibits the point group , generated by three-fold rotation along the out-of-plane axis and two-fold rotation along the in-plane axis. Evidence of electronic nematic order has been observed in previous STM experiments Rubio-Verdú et al. (2022); Samajdar et al. (2021) which clearly exhibit stripe-like features breaking the symmetry spontaneously for certain electron concentrations. While simple limiting cases have been compared with the data in Ref. Samajdar et al., 2021, there is no systematic analysis of the microscopic form of nematicity in the system. To fill this gap, we consider the more general case in which all leading terms on the graphene and moiré scale describing nematic order in a continuum-model description of TDBG are included. In addition, as it is common in graphene moiré systems Huder et al. (2018); Kerelsky et al. (2019); Jiang et al. (2019); Choi et al. (2019); Rubio-Verdú et al. (2022), we also allow for finite strain. The Hamiltonian defining the changes in TDBG resulting from nematic order and strain depends on a set of parameters , which we reconstruct from STM data using convolutional neural networks (CNN) in a supervised learning procedure. As such, our study differs significantly from recent works, which focused on detecting the presence or absence of nematic order Goetz et al. (2020) or performed a phenomenological data analysis of STM measurements Taranto et al. (2022) with ML, rather than extracting the underlying microscopic physics as we do here.
II Results
II.1 Nematic order in TDBG
The non-interacting band structure of TDBG features two moiré minibands per spin and valley close to charge neutrality, where a variety of correlation-driven phenomena can emerge Cao et al. (2020); Liu et al. (2020); Shen et al. (2020); Rubio-Verdú et al. (2022); He et al. (2021); Kuiri et al. (2022); Su et al. (2022). In Fig. 1(b), these minibands are denoted as valence (VFB) and conduction flat bands (CFB). The band structure shown is obtained from continuum model calculations close to half filling of the CFB (band filling ), where electronic nematic order was observed to be the strongest Rubio-Verdú et al. (2022), see Appendix A for more details. STM experiments probe the band structure and wave functions of a system by providing direct access to the spatial and energy dependence of the LDOS. Most commonly, the LDOS is studied either for a fixed position over a range of different energies, , or for a fixed energy covering a spatial region of the system, . The behavior of and following from the continuum model for TDBG for three different energies and high-symmetry positions in the moiré unit cell is shown in Fig. 1(c). The rotational and translational symmetry of the moiré lattice can be clearly seen in . Meanwhile, is broken, albeit weakly, as a consequence of the electric field required to control the electron filling to be close to the middle of the CFB in an open-faced STM sample geometry Rubio-Verdú et al. (2022).
In graphene moiré systems, there are two fundamentally distinct sources of symmetry breaking—strain and electronic nematic order. Postponing the discussion of the former to below, electronic nematic order Fradkin et al. (2010); Fernandes et al. (2014) refers to the spontaneous rotational symmetry breaking as a result of electronic correlations. While recent works also indicate the possibility of nematic charge-density wave states in TDBG Wilhelm et al. (2022); He et al. (2021), where moiré translational symmetry is simultaneously broken, we here focus on translationally symmetric nematic order since the STM data of Ref. Rubio-Verdú et al., 2022 preserves moiré translations. The underlying nematic order parameter we study is a time-reversal- and moiré-translation-invariant vector , , transforming under the irreducible representation of (or of , taking into account the weak breaking); and stand for the intensity and orientation of the nematic director, respectively. The microscopic form of nematicity can be modeled by a coupling of to a fermionic bilinear and reads in its most general form in a continuum-model description as Samajdar et al. (2021)
[TABLE]
where and are the electronic creation and annihilation operators. This general form encompasses couplings between the two sublattices of the microscopic graphene sheets, the four graphene layers , the valley and spin degrees of freedom in the tensorial form factor ; its two components are required to transform in the same way as under all symmetries of the system. In the following, we will take to be trivial in the spin and diagonal in the valley indices, . This is motivated by the weak spin-orbit coupling in graphene Kane and Mele (2005); Min et al. (2006) and the lack of indications of interaction-induced spin-orbit coupling, which is also strongly constrained Kiselev et al. (2017). Furthermore, the intervalley-coherent nematicity is known to lead to stronger effects on the remote bands Samajdar et al. (2021) that were not observed experimentally Rubio-Verdú et al. (2022).
Since we are working with a continuum theory, the space of possible couplings in Eq. (1) is technically infinite dimensional. As such, a complete reconstruction of from experimental data is impossible given the finite resolution and energy range of the available data. On top of this, it is not required either as we are primarily interested in understanding the low-energy behavior of the system. In the spirit of gradient expansions commonly used in continuum low-energy field theories, we will therefore only keep the leading terms in . There is, however, a subtlety associated with the presence of an additional moiré length scale. We will therefore have to consider two basic classes of nematic orders, referred to as graphene (GN) and moiré (MN) nematicity Rubio-Verdú et al. (2022); Samajdar et al. (2021).
In the case of MN, nematic order is associated with the moiré scale, i.e., we choose in Eq. (1), , with moiré lattice vectors , to represent the non-trivial transformation behavior of under . We can thus take it to be diagonal in the remaining internal indices, yielding
[TABLE]
with multi-index . We further focus on the lowest moiré-lattice harmonic by setting and only keeping the terms with the shortest possible . Intuitively, MN order can be thought of as a distortion of the effective inter-moiré-unit-cell hopping matrix elements, as illustrated schematically in the lower right panel of Fig. 1(d).
Conversely, GN acts as a local order parameter, in Eq. (1), without any explicit reference to the moiré scale,
[TABLE]
Here, the correct transformation properties of result from its structure in the internal indices. Focusing on the local intra-layer contributions and the leading (constant) basis function, the most general form reads as
[TABLE]
where Pauli matrices in sublattice space are represented by ; and are real-valued parameters. As shown schematically in the upper left panel of Fig. 1(d), one can think of GN as the nematic distortion of the bonds of the individual graphene layers in a way that preserves the graphene translational symmetry.
We emphasize that GN and MN should not be viewed as distinct phases; they break the same symmetries and as such in general mix. We thus take to describe nematicity in TDBG in the following, which depends on the set of parameters . The computation of the LDOS for a specific set of parameters can be done straightforwardly from the continuum model. The resulting spatial dependence of the LDOS, , is also shown in Fig. 1(d) for two different values of . As opposed to the plots without nematic order, is now broken, leading to stripes in the VFB, while translational symmetry is still preserved. The inverse problem—inferring the value of the parameters from a given LDOS pattern—is a much more challenging task. Our goal in the following sections will be to use ML, in particular, CNNs to learn the set directly from LDOS images.
II.2 ML architecture
Using CNNs to solve this inverse problem can be interpreted as a supervised learning task Dawid et al. (2022), i.e., a regression-like procedure using synthetic LDOS data labeled by their respective value of nematicity parameters . More specifically, our CNNs take as inputs pixels of LDOS images and apply consecutive transformations (represented by a set of weights between each layer) in order to extract meaningful correlations that represent the set . One example of the CNN image inputs is shown in Fig. 2(a). The complete dataset consists of 12000 images which are divided into training (), validation and test subgroups. Each image is generated for a randomly sampled set of nematic parameters and the intensities in the LDOS are modified with the addition of Gaussian noise (see Appendix A). The motivation for noise is twofold: to avoid overfitting Goodfellow et al. (2015) and to test the stability against and performance of the procedure with noise, which is inevitably present in experimental data.
The ML architecture chosen for this task is represented in Fig. 2(a) and its implementation was done with the TensorFlow library Abadi et al. (2015). Each convolutional layer is followed by batch normalization and max pooling layers (Conv-Batch-MaxPool). The batch normalization layers normalize the input weights in each stage, and also reduce the number of epochs necessary for convergence Ioffe and Szegedy (2015). This process is repeated four times, with the convolutional layers having a kernel size of and strides set to . The filters follow a sequence of with rectified linear unit (ReLU) activation functions Fukushima (1975). Padding is set to zero such that the reduction of dimensionality is performed only by the MaxPool layers. In turn, these have both strides and pool sizes set to . After a Flatten stage, dense layers lead to a dropout before the final layer with filters equal to the number of parameters in . The Flatten layer transforms the data to a one-dimensional shape, and the Dropout reduces overfitting by setting a percentage of 20% adjusted weights to zero Srivastava et al. (2014). Tests on variations of this architecture are briefly described in Appendix B.
The learning procedure is then defined by the minimization of the loss function with respect to the CNN’s weights in a backward propagation procedure Rumelhart et al. (1986). The loss function can be represented as the mean squared error (MSE), which is defined as the difference between the true and expected set of parameters in , with representing the number of samples in the test dataset. Finally, we consider the adaptive moment estimation (ADAM) for the minimization of the loss function, with a learning rate of 0.001 and batch size equal to 64 Kingma and Ba (2017). After the completion of the training stage, the algorithm is ready to be deployed to previously unseen data, returning as outputs the parameters .
II.3 Orientation of nematic director
As a first investigation, we consider the task of predicting the orientation of the nematic director from images at a single energy in the VFB [ meV, see Fig. 1(b)]. For this, we consider a dataset with randomly generated MN and GN intensities eV, and . Furthermore, and for all layers. The relation between the shape of the LDOS at a single energy and is highly non-trivial for two reasons: even for a given form of nematicity, changing generically not just merely rotates the LDOS pattern, due to the lattice, but leads to complex distortions of its structure. Additionally, by sampling , even if the same bond direction is favored over the -related ones in the LDOS pattern of two samples, the underlying can be rather different. As can be seen in the three sample LDOS plots in Fig. 2(b) with different values of , the correspondence between and is complex and not apparent to the human eye.
Using the angles as labels to the data is the most straightforward choice, but leads to inaccurate predictions around [math] and due to the periodicity in the definition of the nematic order parameter, . To circumvent this feature, we use the two-component label instead of in the training process and then fold the network’s prediction back to with the function Fischer et al. (2015). The results, shown in Fig. 2(b), are consistent with the true labels, including at the boundaries of ’s domain. This shows that even when the precise nature of nematicity (predominantly MN or GN or an admixture of the two) is not known, the director orientation can be accurately predicted with our CNN setup from at a single energy. We have checked that the few outliers in Fig. 2(b) are directly related to small nematic intensities, where has virtually no impact on the LDOS and is, thus, impossible to predict.
II.4 Form of nematicity
After successfully learning the director orientation in the presence of different nematicities, we proceed into investigating finer details of these couplings by learning the parameters defined in Eqs. (2-4). To this end, we consider and for all layers. For concreteness, we set , which is one of the possible discrete orientations ( and symmetry related) of the nematic director in presence of . The dataset now consists of randomly generated MN and GN intensities eV, and . The intensity values are chosen such that the stripes in the VFB resemble the experimental results Rubio-Verdú et al. (2022). As with , instead of learning the angular variable directly, the mapping from Section II.3 is applied.
Using only the LDOS at a single energy (i.e. one channel) in the ML architecture for this task does not produce accurate predictions. Additionally, both hyperparameter optimization and architecture modifications did not lead to any significant improvement, implying that nematic order impacts the electronic structure in complex ways that cascade across energy scales. In fact, this is also intuitively clear since, for example, the samples marked by a star and pentagon in Fig. 3(a) have fundamentally different nematic couplings and yet exhibit visually similar images at the VFB energy.
In experiments, one can typically obtain single point spectra [] and real space LDOS images at fixed energies []. We can therefore include additional input channels corresponding to and for different energies and points , respectively. In the second case, the individual point spectra are transformed to scaleogram images for consistency with the input data for CNNs Berthusen et al. (2021); Mallat (1999), see upper left inset in Fig. 3(a) and Appendix A. The new architecture is then formed by four channels with inputs at fixed energies meV within the flat and remote bands, such that they resemble visually the corresponding ones in the experimental data of Ref. Rubio-Verdú et al., 2022, and three channels for scaleogram inputs at stacking positions , cf. Fig. 1(c). Each channel is passed through parallel Conv-Batch-MaxPool layers as in Fig. 2(a), but instead of flattening each channel separately, they are concatenated to a Dense-Dropout stage before the last layer [Fig. 3(a)].
In Fig. 3(b-d), predictions on the test data set are represented for (b) , and (c) the moiré and (d) graphene nematic intensities; as can be seen, very good agreement is found between the reconstructed and true parameters. The outliers in are related to small (brighter colors). From Eqs. (3) and (4), it is clear that for small , minimal changes will be induced in the LDOS, irrespective of the true value of the phase governed by . This is a similar behavior to what was observed for outliers in the nematic director prediction. If is maintained fixed, we observed (not shown) that predictions for and get more accurate. The results of Fig. 3 demonstrate that the microscopic form of nematicity can be extracted from the LDOS if significant energy dependence is included in the input data set.
II.5 Including strain
As already alluded to above, another possible source of breaking is strain Nguyen and Dollfus (2015); Yan et al. (2013); Huder et al. (2018); Bi et al. (2019), which is believed to be a ubiquitous property of graphene moiré superlattices at small twist angles. Breaking the same symmetries as nematic order, strain can obscure the experimental identification of nematic order and their precise interplay is still under debate Kerelsky et al. (2019); Jiang et al. (2019); Choi et al. (2019); Scheurer (2019). Experiments indicate Huder et al. (2018); Kerelsky et al. (2019); Jiang et al. (2019); Choi et al. (2019); Rubio-Verdú et al. (2022) that the most relevant form of strain in graphene superlattices such as twisted bilayer graphene or TDBG is uniaxial heterostrain. In this case, the matrices describing the in-plane metric deformation of the coordinates in the th rotated bernal bilayer of TDBG are of the form
[TABLE]
Here is the Poisson ratio for graphene and is the matrix describing rotations of 2D vectors by angle . We see that uniaxial heterostrain is characterized by two variables, the strain intensity and the direction of strain, parameterized by the angle .
In the following, we allow for the simultaneous presence of uniaxial heterostrain and nematic order, leading to two additional parameters, and , in . We will study whether our ML approach is still able to extract the microscopic form of nematicity and also learn the relative strength and direction of strain. Note that the form of nematicity is still given by Eqs. (2-4), with the only difference that we replace in the definition of by the strained moiré lattice vectors. The data set for this task is built with nematic intensities eV, with the addition of strain parameters and . Here, , and . The domain for the strain intensities is chosen based on typical values observed in TBG Kerelsky et al. (2019), and for on the periodicity of the unstrained system as Bi et al. (2019). The ML architecture employed in this section is the same as in the previous investigation [Fig. 3(a)].
In Fig. 4(a-d), predictions on the test data set are shown for (a), (b), and the nematic intensities (c-d). At first sight, the result for the strain angle in Fig. 4(b) looks as if the procedure ceased to work, since there are many data points where the true and predicted value of differ significantly. However, when indicating the true strain intensity label for each prediction, it becomes clear that the outliers are related to small values of (brighter colors). As such, this behavior is not a shortcoming of the learning procedure but actually a feature of strain: for small enough in Eq. (5), the angle has no meaning. We have checked that removing the samples with small strain from the training and test data set will lead to accurate predictions of . The stability that we find for our learning procedure in the presence of virtually vanishing is, however, important when applying it to experimental data, where the strength of strain is unknown.
Most importantly, we see in Fig. 4(c-d) that the nematic couplings can still be accurately predicted when varying strain is present. The MAE is equally distributed in these cases, in contrast to the strain intensity prediction. This shows that not only nematic order can be identified when strain is present, but also its internal structure and the strength of strain that is present at the same time can be resolved when using different channels consisting of both and as inputs. This allows the networks to take into account correlations between different energies in the STM data, which in turn conveys the crucial microscopic physics, enabling the model to disambiguate between lattice and electronic effects.
II.6 Experimental data
After demonstrating the effectiveness of CNNs on learning microscopic parameters from a synthetic (theoretical) data set with samples, we now proceed into applying the trained ML architecture for predictions of the a priori unknown sets of parameters in an experimental data set . For concreteness, we use the same synthetic training data set as in Appendix B, where only the nematic and strain intensities are predicted, i.e., . The data set is constituted of both scaleograms and maps for different fillings of the CFB (). More details about the preprocessing of the experimental data can be found in Appendix C.
In Fig. 5, predictions of the trained CNN for the set show non-zero values of nematicity (a) and strain (b) for all fillings of the CFB. For (gray region), the experimental data shows the most pronounced signatures of broken rotational symmetry to the human eye, which was previously interpreted as electronic nematic order Samajdar et al. (2021); Rubio-Verdú et al. (2022). Here the CNN predicts MN to dominate over GN, although both are finite (as expected by symmetry). As can be seen in Fig. 5(c), the parameters predicted by the CNN nicely reproduce the key features in the experimental data, including the strong stripes in the VFB and the much weaker, albeit finite, signatures of nematicity in the other bands.
For smaller fillings, , the experimental data still exhibit distortions that break , see Appendix C, but no clear stripe-like features appear. The CNN tries to assign different anisotropy sources to these distorted regions, but the agreement between theoretical prediction and experiment is less accurate than for larger . It is clearly possible that, indeed, a crossover from primarily MN to GN occurs when lowering , as predicted by the neural network, see Fig. 5(a), in particular, since nematic order is also a plausible instability in non-twisted bilayer graphene Liu et al. (2021); Cvetkovic et al. (2012). However, we believe that additional experimental data and refined theoretical models are required to conclude whether this is really the case.
In contrast to this interplay between the nematic couplings, strain remains relatively constant for all , and slightly decreases in Fig. 5(b) for as it approaches the same order of magnitude of that is expected for the experimental samples in Rubio-Verdú et al. (2022). We note that at low fillings the value of strain that is predicted by the neural network is nevertheless significantly greater than the value extracted from experimental topography. This is likely a consequence of subtle differences between the continuum model calculations and the experimental spectroscopy, which the network attempts to accommodate by including finite strain.
III Discussion
We constructed and demonstrated a ML procedure that can extract the form of the nematic order parameter in TDBG from LDOS data. The key ingredient was the use of several channels that capture the correlations among different energies. Our work has several important implications. First, it shows that the presence and even the strength and internal structure of nematic order can be extracted when the sample exhibits significant heterostrain; this is a crucial aspect for moiré systems where the issue of distinguishing between nematicity and strain has been subject of debate. Second, our analysis also shows which type of STM data is needed and most useful to extract information about nematicity: as we have seen, the LDOS maps at a single energy, , are not enough to deduce the form of the nematic order parameter and—contrary to what one might have expected—point spectra, i.e., , contain a lot of helpful complementary information for that task (see also the second model discussed in Appendix D). We emphasize that this form of ‘solid-state Hamiltonian learning’, i.e., of parameterizing the leading terms of a set of microscopic order parameters (like nematic order) or perturbations (such as strain) and extracting their form using multi-channel CNNs can be more broadly applied—to other systems, see Appendix D where we discuss a toy model for twisted-bilayer graphene, and other forms of instabilities. As such, this could open up completely new ways of revealing the form and role of nematic order and other phases for the physics of quantum materials.
Acknowledgements.
J.A.S. and M.S.S. acknowledge funding by the European Union (ERC-2021-STG, Project 101040651—SuperCorr). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. Salary support is also provided by the National Science Foundation via grant DMR-2004691 (S.T.) and by the Office of Basic Energy Sciences, Materials Sciences and Engineering Division, U.S. Department of Energy under Contract No. DE-SC0012704 (A.N.P.). J.A.S. is grateful for discussions with J. P. Valeriano, Sayan Banerjee, Patrick Wilhelm, Igor Reis and Pedro H. P. Cintra. M.S.S. also thanks R. Samajdar, R. Fernandes, and J. Venderbos on a previous collaboration on nematic order in TDBG Samajdar et al. (2021).
Appendix A Continuum model and LDOS maps
Twisted double-bilayer graphene (TDBG) consists of two bilayer graphene stacks with primitive lattice vectors and , where nm is the lattice constant of graphene, and corresponding reciprocal lattice vectors , following from . After applying a twist angle between the Bernal stacks, these vectors are modified by the two-dimensional rotation matrix as with for each stack . The corresponding Dirac cones in each valley of the individual graphene layers are located at . The emerging moiré pattern [with triangular Bravais lattice shown as green domains in Fig. 1(a)] is represented by the difference of the new lattice vectors from each bilayer stack in reciprocal space as , with corresponding primitive lattice vectors obtained from the relation .
We consider a description of the low-energy physics for TDBG via the continuum Hamiltonian of Ref. Koshino, 2019. In the Bloch basis given by carbon’s pz orbitals , with sublattices and layers , the continuum Hamiltonian in valley for small twist angles () in AB-AB double bilayer graphene can be written as
[TABLE]
with Bloch wavevectors generated by and single-layer graphene Hamiltonians with as
[TABLE]
The explicit matrix structure in Eq. (7) refers to sublattice space (associated with Pauli matrices in the main text). In turn, and are coupled via
[TABLE]
with parameters and . For more details about the physical significance of each term, see Ref. Koshino (2019). Considering a self-consistently calculated screened electric field, the interlayer potential matrix reads as
[TABLE]
where is the unit matrix in sublattice space. For a filling fraction of , on-site potentials representing the electrostatic energy between adjacent layers are given by meV Samajdar et al. (2021), which we use in our numerical calculations.
Finally, the moiré interlayer coupling defined between the twisted layers is given by
[TABLE]
with , eV and eV. These parameters were chosen in accordance with Ref. Samajdar et al. (2021); Koshino (2019).
From the continuum Hamiltonian in Eq. (6), a numerical diagonalization in momentum space is performed by selecting a finite number of wavevectors in a cutoff circle , with radius around the midpoint between Dirac cones . Here, the Bloch vector in the moiré Brillouin zone is hybridized with the graphene eigenstates at due to the coupling between Bernal bilayers via Eq. (10), with . Since we do not consider the intervalley graphene nematicity Samajdar et al. (2021), the calculations are performed with a fixed valley index, e.g., . The corresponding band structure for can be obtained by a time-reversal symmetry transformation. For a certain band , the wave functions, truncated up to a wavevector in the reciprocal lattice, are represented as
[TABLE]
with each term
[TABLE]
containing elements in layer and sublattice spaces. From these wave functions, the LDOS mappings and are computed from
[TABLE]
where is the corresponding eigenvalue to the wave function . This is already projected onto the topmost graphene layer , where tunneling of electrons from the STM tip are expected to occur in the experimental setup.
To transform the one-dimensional map of LDOS as a funtion of energies into image inputs for the CNN, we use continuous wavelet transforms (CWT) Berthusen et al. (2021); Mallat (1999). These are defined as
[TABLE]
representing the mother wavelet function in a real Morlet form. Here, the transformation is linearly spaced, with being equivalent to the spacing in energy taken in the maps . This scale factor is analogous to frequency in Fourier transforms. Besides , there is a time scale which is also taken as , such that produces pixel images. An example of such a “scaleogram” is shown in Fig. 6.
Finally, the LDOS pixel intensities in both maps and are modified by the addition of Gaussian random noise via with Liu et al. (2022b). For images, the noise must be added before the CWT for physical consistency.
Appendix B Including strain with fixed , and variations of the ML architecture
In this appendix, we discuss the changes in the performance of the ML procedure when the training data set or the ML architecture are modified. First, we tested in all cases (Sections II.3-II.5) the performance for the predictions of intensities for fixed angles (, and , respectively). For concreteness, we here focus on predicting the microscopic nematic form in the presence of strain (Sec. II.5). The data set for this task is, as before, built by randomly sampling nematic and strain intensities eV, and . Here, , , and . In this case, all intensities are easily distinguishable and with high accuracy, see Fig. 7. An identical behavior was observed for investigations with fixed and for the predictions of the microscopic form of nematicity (Sec. II.4), showing that this is a general feature of the considered CNN architecture. Naturally, in the absence of outliers, the precision of the CNN can be further increased (i.e., by reaching a lower MAE for the predictions) with increasing size and variability of the data set Chollet (2021).
We next address whether the complex architecture in Fig. 2(a) for each channel is really necessary to solve this inverse problem, or in other words, whether simpler architectures could have the same performance and whether modifications of it could produce significant changes in the predictions. For this, we compared the results from the main text with two other architectures in the case of learning the microscopic form of nematicity (Sec. II.4): (i) a very simple sequential neural network that takes the images as inputs, followed by a flatten and dense layer which predicts the parameters ; (ii) the architecture in Ref. Berthusen et al., 2021, which has a similar structure, i.e., Conv-Batch-MaxPool channel followed by dense layers, but with different number of filters in each layer; we refer to Ref. Berthusen et al., 2021 for details of the architecture. We have found that, even if there is some clear correlations between the true and the predicted values of in simple architectures, such as (i), it fails completely on predicting the nematic intensities. Additionally, (ii) does not lead to any significant improvement in the predictions.
We also investigated the performance of the CNN with respect to hyperparameter optimization for both Sections II.4 and II.5. These included using different activation functions (SELU, ELU, LeakyReLU, PReLU, ReLU and Sigmoid) Abadi et al. (2015), batch sizes (9, 16, 32, 48 and 96), different number of filters and convolution layers in the Conv-Batch-MaxPool channels, different learning rates (, , and ) and optimizers (RMSprop, SGD and ADAM). The architecture described in Sec. II.2 and its variation in Fig. 3(a) already correspond to the optimal configuration. Even though these investigations are not an exhaustive treatment with respect to all possible parameters and correspondent combinations, it shows that certain elements play a major role in the CNN’s performance, such as choosing ReLU as activation functions, setting padding to zero in the convolution layers and using a learning rate of . Finally, even though the four Conv-Batch-MaxPool channels in the main architecture may not be necessary for simpler cases (e.g. predicting the nematic director with fixed nematic intensities), it is essential for increasing parameters and complexity, such as learning strain and the internal structure of nematicity simultaneously.
Appendix C Preprocessing of the experimental data and further implications
The experimental data set consists of samples for fillings of the CFB equal to . Each sample has images in an energy interval of meV with resolution of meV. From these, the channels can be calculated by taking an average of intensities at the corresponding BAAC, ABCA and ABAB sites, see Fig. 8(a).
In order to obtain consistent results, the experimental data set needs to be fed into the trained CNN as similar as possible to the training data in . For this, the preprocessing of consists of:
- (1)
Transforming the experimental plots into scaleograms as described in Fig. 6. Here, these plots are considered for . This is necessary in order to have scaleograms of pixels. In this energy range, these channels contain information about CFB, VFB, RV1 and RC1.
- (2)
Normalizing each image to have the distribution of pixel intensities with same mean and standard deviation in both and . Here, we have chosen and Chollet (2021). This step is essential to produce meaningful predictions on , since the trained CNN have weights associated to the scale of .
- (3)
Cropping the images from such that they show roughly the same number of moiré unit cells as in the corresponding ones in . Additionally, the orientations of each of these images in both data setss also need to be consistent pair-wisely, see Fig. 8(b-d) and Fig. 5(c).
We have also investigated the influence of additional preprocessing of the channels of the data set on the predictions. We introduce more contrast to the images [Fig. 8(c)] and reduce noise by smoothing the pixel distribution with a multidimensional Gaussian filter [Fig. 8(d)]. In Fig. 8(e-f), the resulting predictions for the nematicities and strain using these augmented are shown. Here, every dot represents an average over the predictions on 10 variations of with Gaussian filter with standard deviations of the Gaussian kernel in with and without higher contrast. The overall behavior described in Sec. II.6 is unaffected by these modifications, but the predictions with the lowest strain intensity in the gray region of Fig. 5(a-b) were found for the raw data inputs [Fig. 8(b)] - see Fig. 9.
We emphasize the importance of the multi-channel CNN architecture in Fig. 3(a) for the predictions in Fig. 5(a-b). While using only at the flat bands already seems to capture the interplay between MN and GN as a function of fillings of the CFB, the addition of channels for the remote bands and scaleograms is crucial to discern the influence of strain and nematicity in the experimental samples. This can be intuitively understood by the relative stability of the remote bands with respect to heterostrain over a wide range of fillings in Rubio-Verdú et al. (2022). These results indicate that the inclusion of more channels from even more remote bands could potentially produce more accurate predictions for the strain intensity; we leave this for future work.
Appendix D Applicability of the CNN to different models and moiré systems
To demonstrate that our ML approach of extracting microscopic parameters based on CNNs with multiple channels works more generally, we here apply it to a different moiré system. To further increase the variability of models studied in this work, we do not use a continuum model but, instead, consider a tight-binding model on the moiré scale that captures the symmetries and topological features of the twisted bilayer graphene (TBG).
As is well known Kang and Vafek (2018); Koshino et al. (2018); Po et al. (2018), the representations of the flat bands of TBG at high-symmetry momenta requires taking a model on the honeycomb lattice. To be able to study the valleys separately, we take the valley quantum number to be conserved such that the associated (fragile) topological obstructions necessitates taking at least four bands Po et al. (2019). We, therefore, place two Wannier orbitals at every site of the honeycomb lattice. We choose them to be invariant under (three-fold rotation perpendicular to the graphene layers) and transform into one-another under (two-fold rotation along ) and (the product of time-reversal and two-fold rotation perpendicular to the layers); this specifies the behavior of the Wannier orbitals under all symmetries of TBG that act within a given valley. Here, our goal is not to provide a quantitatively accurate description of the LDOS of TBG but rather to demonstrate our ML procedure. It is therefore sufficient to take the simple, phenomenological forms of the Wannier states given by
[TABLE]
which obey the required symmetry constraints, as shown in Fig. 10(a). In Eq. (15), and are real-valued constants that we set to for concreteness.
Including symmetry-allowed intra-orbital (inter-orbital) hopping processes up to third-nearest (nearest) neighbor leads to a tight-binding model with momentum-space form (for valley and a given spin flavor)
[TABLE]
with orbital Hamiltonians
[TABLE]
coupled via
[TABLE]
with
[TABLE]
and
[TABLE]
where are four-component electronic creation operators with the first two (second two) indices referring to the two sublattices of Wannier orbitals (). Here, the two primitive lattice vectors from monolayer graphene are given by and . In Fig. 10(b,c), we show the band structure for , which are also used in the ML calculations presented below. We will take the lower two, isolated bands [indicated in black in Fig. 10(b)] as a phenomenological description of the quasi-flat bands of TBG; they exhibit Dirac cones at K and K’, which can be shown to have the same chirality—exactly as in TBG.
We next add different forms of nematicity to , i.e., , which have very different structure compared to those discussed in the main text since the model is very different. The nematic order parameter will couple as . Here, the matrix-valued functions play a similar role as the tensorial form factor in the continuum model nematic coupling in Eq. (1). Denoting by and Brillouin-zone-periodic, real-valued functions that transform as and under TBG’s point group , we can write
[TABLE]
where are parameters and () are Pauli matrices in Wannier (sublattice) space. Technically, the explicit form of and in each of the four terms in Eq. (21) can be different. However, the functional space of possible and is technically infinite dimensional and we will focus only on the leading contribution which then also becomes identical for all four terms in Eq. (21) and reads as
[TABLE]
Consequently, there are four parameters, , describing the microscopic form of nematicity in our model. Our goal will be to reconstruct their values from LDOS images, which we compute via
[TABLE]
with
[TABLE]
The indices and of in Eq. (23) correspond to four different realizations of the Wannier functions in each unit cell , as each of the two orbitals can be placed on each of the two sublattices.
To reconstruct , we consider a variation of the ML architecture in Fig. 3(a) with four channels for with , and one for the scaleograms from . The complete data set consists of 12000 images which are divided into training (78.5%), validation (15%) and test (6.5%) subgroups. These are generated with randomly sampled for and for a fixed nematic director . All the images in the data set were modified by the addition of Gaussian noise with a standard deviation of . In Fig. 11(a-d), one can see that all four parameters can be accurately predicted. Additionally, we have also observed that even training the CNN with only a single channel in this case is sufficient to also yield very good predictions, evidencing the fundamental role of point spectra as an additional source of information. These results indicate that the framework proposed in this work could be successfully applied to a plethora of correlated phenomena and different moiré systems.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Carleo et al. (2019) G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, “Machine learning and the physical sciences,” Reviews of Modern Physics 91 , 045002 (2019) . · doi ↗
- 2Dawid et al. (2022) A. Dawid, J. Arnold, B. Requena, A. Gresch, M. Płodzień, K. Donatella, K. A. Nicoli, P. Stornati, R. Koch, M. Büttner, R. Okuła, G. Muñoz-Gil, R. A. Vargas-Hernández, A. Cervera-Lierta, J. Carrasquilla, V. Dunjko, M. Gabrié, P. Huembeli, E. van Nieuwenburg, F. Vicentini, L. Wang, S. J. Wetzel, G. Carleo, E. Greplová, R. Krems, F. Marquardt, M. Tomza, M. Lewenstein, and A. Dauphin, “Modern applications of machine learning in quantum sciences,” (2022), ar Xiv:2204.04198 · doi ↗
- 3Lee et al. (2023) J. Lee, M. R. Carbone, and W. Yin, “Machine-learning the spectral function of a hole in a quantum antiferromagnet,” (2023), ar Xiv:2301.07906 [cond-mat] . · doi ↗
- 4Dubois et al. (2022) A. Dubois, D. Broadway, A. Stark, M. Tschudin, A. Healey, S. Huber, J.-P. Tetienne, E. Greplova, and P. Maletinsky, “Untrained Physically Informed Neural Network for Image Reconstruction of Magnetic Field Sources,” Physical Review Applied 18 , 064076 (2022) . · doi ↗
- 5Berthusen et al. (2021) N. F. Berthusen, Y. Sizyuk, M. Scheurer, and P. Orth, “Learning crystal field parameters using convolutional neural networks,” Sci Post Physics 11 , 011 (2021) . · doi ↗
- 6Chertkov and Clark (2018) E. Chertkov and B. K. Clark, “Computational Inverse Method for Constructing Spaces of Quantum Models from Wave Functions,” Physical Review X 8 , 031029 (2018) . · doi ↗
- 7Choudhary et al. (2021) K. Choudhary, K. F. Garrity, C. Camp, S. V. Kalinin, R. Vasudevan, M. Ziatdinov, and F. Tavazza, “Computational scanning tunneling microscope image database,” Scientific Data 8 , 57 (2021) . · doi ↗
- 8Joucken et al. (2022) F. Joucken, J. L. Davenport, Z. Ge, E. A. Quezada-Lopez, T. Taniguchi, K. Watanabe, J. Velasco, J. Lagoute, and R. A. Kaindl, “Denoising scanning tunneling microscopy images of graphene with supervised machine learning,” Physical Review Materials 6 , 123802 (2022) . · doi ↗
