Detecting composite orders in layered models via machine learning

W. Rzadkowski; N. Defenu; S. Chiacchiera; A. Trombettoni; G. Bighin

arXiv:1907.05417·cond-mat.dis-nn·September 29, 2020

Detecting composite orders in layered models via machine learning

W. Rzadkowski, N. Defenu, S. Chiacchiera, A. Trombettoni, G. Bighin

PDF

TL;DR

This paper demonstrates how machine learning, specifically convolutional neural networks, can effectively identify phases and composite order parameters in layered spin models, including hidden orders, without prior phase knowledge.

Contribution

The authors introduce a machine learning approach that accurately characterizes phases and detects composite orders in layered models directly from Monte Carlo data, without preprocessing.

Findings

01

Successfully identified all phases in the Ashkin-Teller model, including hidden composite orders.

02

Correctly distinguished ferromagnetic and paramagnetic phases in bilayer and trilayer Ising models.

03

Method is versatile and applicable to various layered systems without prior phase information.

Abstract

Determining the phase diagram of systems consisting of smaller subsystems 'connected' via a tunable coupling is a challenging task relevant for a variety of physical settings. A general question is whether new phases, not present in the uncoupled limit, may arise. We use machine learning to study layered spin models, in which the spin variables constituting each of the uncoupled systems (to which we refer as layers) are coupled to each other via an interlayer coupling. In such systems, in general, composite order parameters involving spins of different layers may emerge as a consequence of the interlayer coupling. We focus on the layered Ising and Ashkin-Teller models as a paradigmatic case study, determining their phase diagram via the application of a machine learning algorithm to the Monte Carlo data. Remarkably our technique is able to correctly characterize all the system phases…

Equations16

d ((J_{1}, K_{1}), (J_{2}, K_{2})) = 2 (φ - 0.5) Θ (φ - 0.5),

d ((J_{1}, K_{1}), (J_{2}, K_{2})) = 2 (φ - 0.5) Θ (φ - 0.5),

\nabla u (J, K) = ((u (J + Δ J, K) - u (J, K)) /Δ J (u (J, K + Δ K) - u (J, K)) /Δ K) \equiv (d ((J + Δ J, K), (J, K)) /Δ J d ((J, K + Δ K), (J, K)) /Δ K) .

\nabla u (J, K) = ((u (J + Δ J, K) - u (J, K)) /Δ J (u (J, K + Δ K) - u (J, K)) /Δ K) \equiv (d ((J + Δ J, K), (J, K)) /Δ J d ((J, K + Δ K), (J, K)) /Δ K) .

\nabla^{2} u (J, K) \approx \frac{1}{( Δ J ) ^{2}} i = 0 \sum n (- 1)^{i} (i n) u (J + (n /2 - i) Δ J), K) + \frac{1}{( Δ K ) ^{2}} i = 0 \sum n (- 1)^{i} (i n) u (J, K + (n /2 - i) Δ K)),

\nabla^{2} u (J, K) \approx \frac{1}{( Δ J ) ^{2}} i = 0 \sum n (- 1)^{i} (i n) u (J + (n /2 - i) Δ J), K) + \frac{1}{( Δ K ) ^{2}} i = 0 \sum n (- 1)^{i} (i n) u (J, K + (n /2 - i) Δ K)),

M = \frac{J _{max} - J _{min}}{Δ J} \cdot \frac{K _{max} - K _{min}}{Δ K}

M = \frac{J _{max} - J _{min}}{Δ J} \cdot \frac{K _{max} - K _{min}}{Δ K}

H_{bilayer} = - J ⟨ ij ⟩ \sum σ_{i} σ_{j} - J ⟨ ij ⟩ \sum τ_{i} τ_{j} - K i \sum σ_{i} τ_{i},

H_{bilayer} = - J ⟨ ij ⟩ \sum σ_{i} σ_{j} - J ⟨ ij ⟩ \sum τ_{i} τ_{j} - K i \sum σ_{i} τ_{i},

H_{trilayer} = - J ⟨ ij ⟩ \sum σ_{i} σ_{j} - J ⟨ ij ⟩ \sum τ_{i} τ_{j} - J ⟨ ij ⟩ \sum υ_{i} υ_{j} - K i \sum σ_{i} τ_{i} - K i \sum τ_{i} υ_{i},

H_{trilayer} = - J ⟨ ij ⟩ \sum σ_{i} σ_{j} - J ⟨ ij ⟩ \sum τ_{i} τ_{j} - J ⟨ ij ⟩ \sum υ_{i} υ_{j} - K i \sum σ_{i} τ_{i} - K i \sum τ_{i} υ_{i},

H_{AT} = - J ⟨ ij ⟩ \sum σ_{i} σ_{j} - J ⟨ ij ⟩ \sum τ_{i} τ_{j} - K ⟨ ij ⟩ \sum σ_{i} σ_{j} τ_{i} τ_{j}

H_{AT} = - J ⟨ ij ⟩ \sum σ_{i} σ_{j} - J ⟨ ij ⟩ \sum τ_{i} τ_{j} - K ⟨ ij ⟩ \sum σ_{i} σ_{j} τ_{i} τ_{j}

SNR \equiv lo g_{10} (\frac{\frac{1}{N} \sum _{i} ( x _{i} - ν ) ^{2}}{ν ^{2}}),

SNR \equiv lo g_{10} (\frac{\frac{1}{N} \sum _{i} ( x _{i} - ν ) ^{2}}{ν ^{2}}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Detecting composite orders in layered models via machine learning

W. Rządkowski1, N. Defenu2, S. Chiacchiera3, A. Trombettoni4,5 and G.Bighin1

1Institute of Science and Technology Austria (IST Austria), Am Campus 1, 3400 Klosterneuburg, Austria

2Institut für Theoretische Physik, Universität Heidelberg, D-69120 Heidelberg, Germany

3Science and Technology Facilities Council (STFC/UKRI), Daresbury Laboratory, Keckwick Lane, Daresbury, Warrington WA44AD, United Kingdom

4CNR-IOM DEMOCRITOS Simulation Center, Via Bonomea 265, I-34136 Trieste, Italy

5SISSA and INFN, Sezione di Trieste, Via Bonomea 265, I-34136 Trieste, Italy

(March 17, 2024)

Abstract

Determining the phase diagram of systems consisting of smaller subsystems ’connected’ via a tunable coupling is a challenging task relevant for a variety of physical settings. A general question is whether new phases, not present in the uncoupled limit, may arise. We use machine learning to study layered spin models, in which the spin variables constituting each of the uncoupled systems (to which we refer as layers) are coupled to each other via an interlayer coupling. In such systems, in general, composite order parameters involving spins of different layers may emerge as a consequence of the interlayer coupling. We focus on the layered Ising and Ashkin-Teller models as a paradigmatic case study, determining their phase diagram via the application of a machine learning algorithm to the Monte Carlo data. Remarkably our technique is able to correctly characterize all the system phases also in the case of hidden order parameters, i.e., order parameters whose expression in terms of the microscopic configurations would require additional preprocessing of the data fed to the algorithm. We correctly retrieve the three known phases of the Ashkin-Teller model with ferromagnetic couplings, including the phase described by a composite order parameter. For the bilayer and trilayer Ising models the phases we find are only the ferromagnetic and the paramagnetic ones. Within the approach we introduce, owing to the construction of convolutional neural networks, naturally suitable for layered image-like data with arbitrary number of layers, no preprocessing of the Monte Carlo data is needed, also with regard to its spatial structure. The physical meaning of our results is discussed and compared with analytical data, where available. Yet, the method can be used without any a priori knowledge of the phases one seeks to find and can be applied to other models and structures.

1 Introduction

Classification of observations into separate categories is certainly one of the most important applications of machine learning [1]. Successful examples range broadly from the detection of exotic particles in experimental high energy physics [2] through learning human actions in movies [3] to dermatologist-grade skin cancer classification [4]. The classification task is very often performed with artificial neural networks, capable of learning even highly complex and elusive patterns in the data, both detectable and invisible to humans. If multilayer image data is in question, convolutional neural networks (CNNs) perform exceptionally well, mimicking human vision by inferring from small portions of the image at a time. Successful applications of CNNs outside the field of physics are extremely numerous, ranging from the detection of human faces at multiple angles or in partially visible images [5] to the ImageNet large-scale classification challenge [6], just to mention a few.

Translational invariance and adjustable size of the filters, which detect local correlations, make CNNs the ideal candidates for phase diagram reconstruction. The phase diagram is typically reconstructed from a large number of Monte Carlo (MC) snapshots. At first, research efforts revolved around supervised learning on the MC snapshots [7, 8, 9, 10, 11, 12, 13, 14, 15], later shifting to fully unsupervised learning on a chosen observable, such as non-local correlators whose behaviour is modified by the presence of phase transitions [16].

Our goal is to use machine learning techniques to characterize the phase diagram of coupled spin models. We refer to them as layered spin models since it is natural to think of the coupling between two-dimensional spin models considering them separately, say in different layers and involving different spin variables, subsequently turning on the interlayer coupling. The simplest structure of this kind is a bilayer. It is clear that, depending on the form and the strength of the coupling, the phases of the uncoupled models can be altered and new phases, impossible without the interlayer coupling, may emerge. Therefore, it becomes important to devise a general approach that can detect the presence of phases induced by the interlayer coupling.

In this paper we introduce a CNN-based approach capable of fully unsupervised learning of phase diagrams with the network fed exclusively with raw MC snapshots without any a priori knowledge about relevant observables or order parameters. We note that complementary approaches to the unsupervised learning problem have been pursued using principal component analysis and support vector machines [17, 18, 19, 20, 21, 22], deep autoencoders [23] or discriminative cooperative networks [24]. However, here we show that the task of fully-unsupervised phase diagram reconstruction can also be performed using CNNs, allowing one to apply to physical problems a number of techniques developed in the field of computer vision, a field in which CNNs represents the golden standard.

Our approach is applied to the reconstruction of the phase diagram of layered spin models. Our motivation for such an investigation is three-fold. On one side, when two or more models are coupled, new phases may emerge as a result of the presence and of the form of the coupling. Consider, for instance, two magnetic systems with a tunable coupling between each other and suppose that when the coupling is zero, each system separately undergoes a conventional ferromagnetic phase transition [25]. For finite coupling, on the other hand, the order parameter may involve, in the general case, some non-trivial combination of spins of both systems. Let us consider a specific example, i.e. the Ashkin-Teller model, consisting of two square-lattice Ising models with spin variables $\sigma$ and $\tau$ coupled via a term of the form $\sigma\sigma\tau\tau$ [26, 27]. When the interlayer coupling between the variables $\sigma$ and $\tau$ is zero, the phase diagram of the model is characterized only by the order parameters $\langle\sigma\rangle$ and $\langle\tau\rangle$ . On the other hand, when the interlayer coupling is large enough with respect to the intralayer term, a new non-trivial phase with a composite order parameter $\langle\sigma\tau\rangle$ emerges, even when all couplings are ferromagnetic. Further examples of the occurrence of novel order due to the coupling between different layers include the so-called ‘metallic superfluid’ phase [28, 29], as well as the recently-reported BKT-paired phase in two coupled two-dimensional XY models [30]. At last, let us consider again two square-lattice Ising models with spin variables $\sigma$ and $\tau$ , now coupled via a term of the form $\sigma\tau$ : is the phase with composite order parameter $\langle\sigma\tau\rangle$ present or not? As discussed in literature for the bilayer configurations and reviewed below, we expect that such phase should not exist. Since the phase diagram of the 2D Ashkin-Teller model and of some its variations can be determined analytically [27, 31], and similarly the Ising model is a classical workhorse of statistical mechanics [25, 32], they provide an ideal benchmark to look for composite order parameters in un unsupervised way. One could can ask whether and what new composite order parameters emerge in multilayer configurations, such as the trilayer one. Although in the two-variable (or, in our language, bilayer) Ashkin-Teller model the composite order parameter can be easily recognized, a more complex spin model with several layers, with both short- and long-range interlayer couplings, could be much more challenging to be addressed with simple physical considerations. Many, possibly competing, composite order parameters may be present and determining the one which actually breaks the symmetry and generates a novel phase is a highly non-trivial task. From this point a view, an unsupervised approach able to correctly reproduce the phase diagram of layered models, regardless of the nature of underlying order parameters, is highly desirable.

Our second motivation is that layered models emerge in a wide range of physical situations. Among them, the bilayer structure in which two two-dimensional systems are coupled has been studied in a number of cases, ranging from graphene [33] to ultracold dipolar gases [34]. Another major example is provided by layered supercondutors, that can occur naturally or be artificially created. Among the former class, of primary importance are compounds of transition-metal dichalcogenides layers intercalated with organic molecules [35] and cuprates [36]. Examples of artificial structures are alternating layers of graphite and alkali metals [37] or samples with layers of different metals [38]. Neutral layered superfluids can be engineered with quantum gases by using a deep optical lattice in one spatial direction with ultracold fermions [39] or bosons [40]. It is therefore important to develop general approaches capable of dealing with coupled interacting systems. In particular, given the importance of layered physical systems and their ubiquitous presence in a variety of contexts, one may think for instance of layered superconductors, a general approach to individuate their phase diagram – once that one is able to study the uncoupled counterpart – would provide an important tool of investigation.

Finally, our last motivation is purely methodological and inherent to machine learning. Indeed, in layered models one has a certain degree of arbitrarity in the way the MC data to be analyzed are fed to the neural networks, e.g. one can provide the data in each layer separately, or retaining their spatial structure such as columns and ordering them correspondingly. As an example, in the Ashkin-Teller model one can provide numerical algorithms either with all the $\sigma_{i}$ ’s and then all the $\tau_{i}$ ’s, or the pairs $(\sigma_{i},\tau_{i})$ according to the index $i$ labeling the position of the spins in the layers. This arbitrarity also reflects itself in the fact that an order parameter which can be clearly identified with a choice can be non-trivial, or “hidden”, with another choice. A special class of hidden order parameters are composite order parameters, i.e. parameters defined across multiple layers of a layered system. To use again the Ashkin-Teller model as an example, the order parameter $\langle\sigma\tau\rangle$ is immediately identified when the choice of the pairs $(\sigma_{i},\tau_{i})$ is done, but not when the is provided layer by layer. Therefore a natural question is how to identify phase transitions in coupled or layered models driven by order parameters which may be hidden by the codification of the data to be provided to the machine learning algorithm.

2 Machine learning phase transitions in classical spin models

Let us consider a general case of a spin system whose Hamiltonian is defined by two parameters, $J$ and $K$ . We aim to devise a procedure to depict the phase diagram in the $K-J$ plane. To this extent we discretize a portion of the $K-J$ plane on a grid with steps $\Delta J$ and $\Delta K$ . For each point on the grid we generate a number of uncorrelated MC snapshots using standard algorithms [41, 42, 43]. Unless otherwise specified we shall work on a $32\times 32\times N_{l}$ square lattice, $N_{l}$ being the number of layers to be specified later, and we shall generate a number of 600 snapshots for each point in the phase diagram. Periodic boundary conditions are used on each layer throughout all the simulations.

The training of the convolutional neural network attempts at learning to distinguish snapshots belonging to the two different points, $(J_{1},K_{1})$ and $(J_{2},K_{2})$ , in the phase diagram. Intuitively, when this training fails, the two points present nearly identical features, thus belonging to the same phase. On the other hand, if it succeeds, the two points should belong to two different phases. In order to carry out this plan, at first, we divide the data in a standard way, taking 80% of snapshots from each of the two points as training data, while keeping the other 20% as validation data. Then, we train the network on the training data and quantify the classification accuracy on the validation set as the fraction $\varphi$ of correctly labeled examples from the validation set. Based on that, we introduce the following quasidistance 111We use the term ’quasidistance’ since it does not respect triangular inequality. However, this fact plays no role as far as all the applications in the present paper are concerned. between the two phase diagram points $(J_{1},K_{1})$ and $(J_{2},K_{2})$ :

[TABLE]

where $\Theta(x)$ is the Heavyside step function, preventing $d$ from assuming negative values. Then perfect discrimination $\varphi=1$ (signaling different phases) corresponds to $d=1$ , while perfect confusion $\varphi=0.5$ (signaling the same phase) corresponds to $d=0$ .

We feed the raw Monte Carlo snapshots directly to the convolutional neural network, with spin down encoded as 0 and spin up encoded as 1, no preprocessing applied. The network architecture is optimized for the task of classifying two phases: after convolutional and fully connected layers the final layer consists of two softmax output neurons outputting the labels. The convolutional filters span both layers, which is the feature enabling the network to learn composite order parameters. Hence, both layers are simultaneously fed into the network. Further technical details on the network architecture and training can be found in the Appendix.

At last, we make use of the distances defined in Eq. (1) to construct a field $u(J,K)$ defined on the phase diagram through its finite-difference lattice gradient

[TABLE]

Clearly $\nabla u$ will be constant in regions of the phase diagram belonging to the same phase, since we expect that the difficulty of telling first neighbors apart should be uniformly quite high. On the other hand, we expect the value of $\nabla u$ to abruptly change in the vicinity of a phase transition, suggesting that the phase diagram should be naturally characterized by looking at the finite-difference lattice Laplacian

[TABLE]

with the $n=2$ , $n=3$ and $n=4$ cases corresponding to a 5-point, 9-point or 13-point stencil, respectively. The stencil includes $(n-1)$ nearest neighbors in the $J$ and $K$ directions. We stress that the summations can be rearranged so that they involve only differences of the $u$ field evaluated between first, second and third neighbors, that can in turn be expressed in terms of the quasidistance $d$ . From the discussion above, it is clear that a sudden rise in the value of $\nabla^{2}u$ means that the CNN can distinguish with increased precision arbitrarily close points in the phase diagram, thus signaling a phase transition. We anticipate that including high-order finite-differences besides the obvious 5-point stencil taking into account first-neighbors stencil considerably increases the quality of the reconstructed phase diagram. This point will be analyzed in detail later. Moreover, using the stencil as opposed to always just comparing two neighboring points of the phase diagram immunizes the algorithm in the case of very dense grid. In such a case, it would be progressively difficult to find neighboring points belonging to different phases. With our approach, we are assured that using a large enough stencil will circumvent this problem for any grid density.

Calculation of $\nabla^{2}u(J,K)$ for the entire phase diagram is by far the most time-consuming step of the algorithm. Using $N$ nearest-neighbours, i.e. $\left[4(N-1)+1\right]$ -point stencil, it requires $M\cdot 4(N-1)$ calculations of the quasidistance. There,

[TABLE]

is the total number of discretized $(J,K)$ pairs in the phase diagram.

In conclusion of the present Section, we compare our scheme with other related approaches. As opposed to other machine learning schemes, in the present work we do not need the evaluation of any observable quantity to establish a distance [16], rather directly relying on the MC snapshots. Moreover, as opposed to other approaches [44] the scheme we introduce in this paper fully takes advantage of the two-dimensional nature of a two-parameter phase diagram, as the local information is reconstructed by taking into account neighbours in all directions. Extensions to three- or higher-dimensional phase diagrams are straightforward [45]. Finally, our approach requires only the evaluation of a fixed number of neighbors for each point in the phase diagram, ensuring that the computational effort required for training scales linearly with the number of points in the discretized phase diagram.

3 Multilayer Ising models

We now use the framework described in the previous Section to characterize the phase diagram of different coupled spin models.

Let us start from a bilayer Ising system, described by the following Hamiltonian with a quadratic coupling term (sometimes referred to as the Yukawa coupling):

[TABLE]

where $\sigma_{i},\tau_{i}=\pm 1$ are Ising variables on a two-dimensional square lattices, whose sites are denoted by the indices $i,j$ . The sums in Eq. (5) are over nearest-neighbor sites. When $K=0$ , the system reduces to two uncoupled Ising models, having a phase transition at the Onsager critical point $(\beta J)_{c}=\ln(1+\sqrt{2})/2$ [46, 32], $\beta$ being the inverse temperature. This critical point is shifted by the presence of a finite interlayer coupling $K$ . The resulting Ising critical line separating the paramagnetic and ferromagnetic phases as a function of $K$ has been studied in the literature [47, 48, 49]. It is clear that the bilayer system (5) is the classical counterpart of two coupled quantum Ising chains in a transverse field, a system that has been studied both in relation to its spectrum, phase transitions and possibility to determine an integrable line in the space of parameters [50, 51, 52, 53]. The classical bilayer system and the quantum coupled chains can be also related to each other by an exact mapping.

From our point of view the model described by Eq. (5) is an excellent starting point for our investigations, especially in order to check the existence of a composite order parameter and its relation to the phase diagram. It is now natural to parametrize the phase diagram in term of the dimensionless combinations $\beta J$ and $\beta K$ , discretizing it for values of $\beta J\in[0,0.5]$ and $\beta K\in[0,1.4]$ , with discretization steps $\Delta\beta J=\Delta\beta K=0.01$ . We then apply the phase diagram reconstruction procedure described in the previous Section to precisely determine the phase boundaries in the $\beta K$ - $\beta J$ phase diagram, shown in Fig. 1(c). The phase transition occurs at $(\beta J)_{c}\approx 0.44$ in the uncoupled $\beta K=0$ case, in agreement with analytical results [46, 32]. The errors of our method on the determination of transition points are discussed in Appendix B. Then the critical temperature gradually decreases to the strong-coupling critical temperature $(\beta J)^{\prime}_{c}=(\beta J)_{c}/2$ . The width of the peak is essentially due to the the finite-size ( $32\times 32\times 2$ ) of the lattice used for Monte Carlo simulations, whose snapshots we feed to the neural network. The result is that it appears that only two phases are found, with order parameter $\langle\sigma\rangle=\langle\tau\rangle$ . From our treatment of data we cannot determine the behavior of the order parameter inside the two phases, whose study would be an interesting future continuation of the present results.

Next, we consider a trilayer system, whose Hamiltonian is a natural extension of the one of Eq. (5):

[TABLE]

and the new variable $\upsilon_{i}$ is also an Ising spin. This is the first non-trivial example, and of course representative of properties of the multilayer Ising model with Yukawa coupling. The central natural question is whether a composite order parameter emerges. Moreover the model of Eq. (6) is interesting since it paves the way to the investigation of the $N$ layers case, which shall be trivial with the method presented here. Indeed the $N$ layer case may serve to investigate how the (three-dimensional) limit of infinite layers is retrieved, an issue in the context of layered models, see e.g. Ref. [54].

The investigation of the model described by Eq. (6) follows the same line as the one of the bilayer case, we are able to reconstruct the phase diagram as shown in Fig. 2, recovering that strong-interlayer-coupling critical temperature that in this case is $(\beta J)^{\prime\prime}_{c}=(\beta J)_{c}/3$ , marked by a red dashed line. The main result exhibited in Fig. 2 is that no composite order parameter appears even for the trilayer case. Therefore, our technique has been able to correctly recover the phase diagram of the bilayer Ising model, where we do not expect any additional order to appear [55, 56], while it also predicts the same picture for the trilayer case, for which no previous expectation exist up to our knowledge. The generalization to the $N$ -layer case shall be straightforward, but more numerically demanding, while based on the present results no additional phases are expected to appear. Therefore, in the following we are going to investigate a different case where a composite order parameter appears by construction.

4 Reconstructing composite order parameters: the Ashkin-Teller model

We now turn to the square-lattice Ashkin-Teller model, described by the following Hamiltonian

[TABLE]

with $\sigma_{i},\tau_{i}=\pm 1$ . Compared to Hamiltonians (5)-(6) one sees that the coupling is now quartic in spins. Since in the Ising model there are only two scaling fields relevant in renormalization group sense [25, 32], the magnetization and the energy, one sees that in the models (5) and (7) one has basically the two natural ways of having respectively magnetization-magnetization and energy-energy couplings, higher order coupling terms being irrelevant. The Ashkin-Teller model is also related to the four state planar Potts model, and several variations of it, also in three dimensions, have been investigated [57].

The Ashkin-Teller model features a rich phase diagram, and remarkably in two dimensions can be studied analytically [27, 31]. Here we consider the case of ferromagnetic couplings, $J,K\geq 0$ . It is known that three different phases exist [27]. Besides an ordered phase, denoted by I, characterized by $\langle\sigma\rangle\neq 0\neq\langle\tau\rangle$ and a disordered phase, II, characterized by $\langle\sigma\rangle=\langle\tau\rangle=0$ one also finds the peculiar phase III in which the single spins $\sigma$ and $\tau$ are disordered, whereas a composite order parameter given by their product is ferromagnetically ordered, i.e. $\langle\sigma\tau\rangle\neq 0$ .

Whereas the previous investigation of Ising-like models makes us confident that the ML procedure we have introduced is able to correctly characterize the transition between phase I and phase II, it is not a priori clear that phase III can be correctly identified. As shown in the small inset of Fig. 3, MC snapshots show disordered spins both in phase II and in phase III, the transition being determined by the $\sigma\tau$ composite variable, that we do not directly feed to the CNN. In order to learn the existence of the II-III phase transition the CNN must learn to reconstruct the composite order parameter. We find that our framework successfully performs this task, owing to the convolutional filters which are convolved in 2D spanning across the layers and are able to learn even elusive interlayer correlations.

The reconstructed phase diagram of Fig. 3 shows that indeed our approach is able to correctly learn the phase transitions in the ferromagnetic Ashkin-Teller model. Whereas the transition line corresponding to the magnetization of $\sigma$ and $\tau$ , as separated variables, corresponds to a prominent peak, whose width is essentially determined by finite-size effects, the line corresponding to the magnetization of the composite $\sigma\tau$ order parameter corresponds to a smaller peak, displaying that the characterization of this transition line is more demanding to the CNN, but still possible.

We can compare the obtained phase diagram we obtain with some exact results. In the $K\to 0$ the model reduces to a square-lattice Ising model with coupling constant $J$ , with critical temperature $(\beta J)_{c}=\ln(1+\sqrt{2})/2\approx 0.44$ [46, 32], whereas in the $K\to\infty$ limit the model reduces to a square-lattice Ising model with coupling constant $2J$ and critical temperature $(\beta J)^{\prime}_{c}=\ln(1+\sqrt{2})/4\approx 0.22$ . Finally for $J=0$ the system again undergoes an Ising-like phase transition for the composite order parameter, at $(\beta K)_{c}=\ln(1+\sqrt{2})/2\approx 0.44$ . These three points are marked by a red, green and blue diamond, respectively, in the phase diagram of Fig. 3, showing an excellent agreement between the analytical results and the reconstructed phase diagram, even in the latter case when the composite order parameter $\sigma\tau$ drives the transition. Finally, the yellow diamond marks the bifurcation point as determined analytically in Ref. [27]; we attribute the difference with respect to the bifurcation point in our reconstructed phase diagram to finite size effects. We also mention that the critical lines separating the different phases are retrieved with a precision up to $\sim 20\%-30\%$ , except for vanishing $\beta J$ . Again we attribute this to finite size effects; proceeding as extensively discussed in the literature [1] one could obtain a quantitative agreement on the location of the critical lines. Here, our emphasis is on the possibility of retrieving the phases with composite order parameters and to ascertain their existence, as we also did for the trilayer Ising model.

5 Scaling properties and robustness of the approach

Our results show that with the network and learning parameters that we used we were able to obtain a phase diagram of quality high enough to visually identify different phases. In addition, in this Section we characterize our method by quantifying signal to noise ratio (SNR) and studying its behavior when essential parameters are changed. We define the SNR as

[TABLE]

$x_{i}$ being the values of the $\nabla^{2}u$ field of Eq. (3), the summation extending over a region containing $N$ values, $\nu$ being the ‘noise’, i.e. the average value of $\nabla^{2}u$ in a subset of the region far away from a phase transition. We evaluate the SNR for the Ising bilayer on a strip centered on $\beta K=1.1$ , exhibiting a sharp phase transition at $\beta J\approx 0.26$ as clear from Fig. 1. At first, we vary the number of training epochs, observing that the SNR is rapidly increasing before reaching a maximum value at around 5 epochs of training. This indicates that further training brings no benefit while providing a risk of overfitting, justifying our early-stopping approach. Secondly, we vary the number of samples in the training set, showing a rapid increase in the SNR before reaching a plateau at about 400 samples, justifying our choice of using a slightly larger number (600) of samples in the training set. Lastly, we vary the number of convolutional filters in the CNN. Again, the general upwards trend shows that a larger number of convolutional filters helps in enhancing the quality of the reconstructed phase diagrams. However, we stress that in this latter case the SNR is quite high in the whole parameter region we consider. The lowest number of convolutional filters we consider (3) is already enough for achieving a good reconstruction of the phase diagram and a large SNR value. These analyses are shown in Fig. 4.

We have also analysed how the reconstructed phase transition is affected by the dimension of the stencil in Eq. (3). Using a 5-point, 9-point or 13-point stencil we have obtained SNR values of $-1.36\ \mathrm{dB}$ , $0.38\ \mathrm{dB}$ and $3.88\ \mathrm{dB}$ , respectively. This confirms that the approach we are introducing takes indeed great advantage from the two-dimensional structure of the phase diagram, and information from second- and third-nearest neighbors is being used to sharply characterize the phase transition.

6 Conclusions

Thus, as shown for layered spin models such as the multilayer Ising and Ashkin-Teller models, our work demonstrates that ML approaches are able to learn the order parameter driving a phase transition in layered models, also when this parameter is not immediately apparent from the snapshots without preprocessing. This is directly possible due to the convolutional filters which are, without any a priori knowledge, capable of learning even involved algebraic operations that uncover the order parameters from the data. This paves the way to the use of ML approaches to investigate the properties of systems of increasing complexity and to characterizing phases of matter described by multiple, possibly non-local order parameters, the universal approximation theorem [58] ensuring that a neural network can, at least in principle, learn to recognize arbitrarily complex order parameters. In particular it would be very interesting to study the multilayer Ising model with a number of layers increasing, the three-dimensional Ashkin-Teller and the trilayer Ahkin-Teller in two dimensions, which can be studied with the techniques introduced in the paper. Non-local couplings among the layers could be added, which would lead to non-local, more composite, operators. These results should be compared with the identification of hidden order done using non-ML techniques [59]. Also, the present approach may be used for other cases in which the identification of the order parameters is not straightforward [60, 61, 62]. Even if our approach has been devised to deal with coupled spin models and can applied to different geometrical configurations, it is not clear a priori that it would succeed in other more complicated cases of coupled interacting systems, such as multilayer configurations of interacting bosons and fermions or bilayer quantum Hall systems. Of course, in order to study generically coupled models one needs to have an efficient algorithm to simulate the uncoupled systems. Nevertheless, we think the present work provides a methodological basis, highlighting the effect of interlayer coupling on the macroscopic properties and phases of coupled systems.

Naturally, the approach we introduced could also be extended in the future to characterize quantum models, or classical spin models with competition between short- and long-range interactions, or more involved spin models such as the XY model, discretising the continuous degrees of freedom[63]. We expect that by an appropriate choice of the sizes and strides of the filter in the convolutional layer one could characterize antiferromagnetic order parameters, non-local order parameters and exotic order parameters, such as nematic and smectic phases. In this context, current experiments on fermionic dipolar atoms [64, 65] promise to open a new window in the physics of competing long-range and short-range interactions [66], clearing the path for the comprehension of modulated phases in strongly interacting quantum systems.

The presence of spatially ordered structures is a leitmotiv for long-range and layerd systems such as ultra-thin magnetic films [67, 68, 69], iron-based superconductors and cuprates [70, 71]. The pattern structure normally depends on several experimental conditions and it produces a particularly rich phase diagram. Most of the common features occurring in stripe forming systems and modulated phases remain obscure due to the challenges posed by the complicated order parameters, which occur in these cases [72, 73, 74, 75]. The ML technique introduced in the present paper may serve as an essential prove to finally uncover the complexity of such phases.

Our results pave the way for fully automated study of phase diagrams of more general and complicated spin systems. An exciting open problem lying in the realm of so-called explainable artificial intelligence (XAI) [76] is whether machine learning techniques could not only learn to separate phases differing by a ‘hidden’ order parameter, but also identify that parameter. Another natural development of the present work is to use our fully-unsupervised technique to learn directly from experimental data [77, 78, 79]. Finally, it would be interesting to extend the results presented in this paper according to the variational procedure discussed in Ref. [80].

We thank Gesualdo Delfino, Michele Fabrizio, Piero Ferrarese, Robert Konik, Christoph Lampert and Mikhail Lemeshko for stimulating discussions at various stages of this work. W.R. has received funding from the EU Horizon 2020 programme under the Marie Skłodowska-Curie Grant Agreement No. 665385. G.B. acknowledges support from the Austrian Science Fund (FWF), under project No. M2461-N27. N.D. acknowledges support by the Deutsche Forschungsgemeinschaft (DFG) via Collaborative Research Centre SFB 1225 (ISOQUANT) and under Germany’s Excellence Strategy EXC-2181/1-390900948 (Heidelberg STRUCTURES Excellence Cluster).

References

[1]

Carleo G, Cirac I, Cranmer K, Daudet L, Schuld M, Tishby N, Vogt-Maranto L and Zdeborová L 2019 arXiv.org

[2]

Baldi P, Sadowski P and Whiteson D 2014 Nature communications 5 4308

[3]

Laptev I, Marszałek M, Schmid C and Rozenfeld B 2008 Learning realistic human actions from movies CVPR 2008-IEEE Conference on Computer Vision & Pattern Recognition (IEEE Computer Society) pp 1–8

[4]

Esteva A, Kuprel B, Novoa R A, Ko J, Swetter S M, Blau H M and Thrun S 2017 Nature 542 115

[5]

Farfade S S, Saberian M J and Li L J 2015 Multi-view face detection using deep convolutional neural networks Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ACM) pp 643–650

[6]

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al. 2015 International journal of computer vision 115 211–252

[7]

Carrasquilla J and Melko R G 2017 Nature Physics 13 431–434

[8]

van Nieuwenburg E P L, Liu Y H and Huber S D 2017 Nature Physics 13 435–439

[9]

Kim D and Kim D H 2018 Phys. Rev. E 98 022138–7

[10]

Casert C, Vieijra T, Nys J and Ryckebusch J 2019 Phys. Rev. E 99 023304–7

[11]

Zhao X L and Fu L B 2018 arXiv.org

[12]

Richter-Laskowska, Khan, Trivedi and Maśka 2018 Condensed Matter Physics 21 33602–12

[13]

Dong X Y, Pollmann F and Zhang X F 2019 Phys. Rev. B 99 26–6

[14]

Zhang R, Wei B, Zhang D, Zhu J J and Chang K 2019 Phys. Rev. B 99 821–6

[15]

Beach M J S, Golubeva A and Melko R G 2018 Phys. Rev. B 97 045207–8

[16]

Broecker P, Assaad F F and Trebst S 2017 arXiv.org

[17]

Jadrich R, Lindquist B and Truskett T 2018 J. Chem. Phys. 149 194109

[18]

Jadrich R, Lindquist B, Piñeros W, Banerjee D and Truskett T 2018 J. Chem. Phys. 149 194110

[19]

Badawi W, Osman Z, Sharkas M and Tamazin M 2017 A classification technique for condensed matter phases using a combination of pca and svm 2017 Progress In Electromagnetics Research Symposium-Spring (PIERS) (IEEE) pp 326–331

[20]

Ponte P and Melko R G 2017 Phys. Rev. B 96 205146

[21]

Giannetti C, Lucini B and Vadacchino D 2019 Nuclear Physics B 944 114639

[22]

Woloshyn R 2019 arXiv preprint arXiv:1905.08220

[23]

Alexandrou C, Athenodorou A, Chrysostomou C and Paul S 2019 arXiv preprint arXiv:1903.03506

[24]

Liu Y H and van Nieuwenburg E P L 2018 Phys. Rev. Lett. 120 176401–6

[25]

Cardy J 1996 Scaling and renormalization in statistical physics

[26]

Ashkin J and Teller E 1943 Phys. Rev. 64 178–184

[27]

Baxter R J 2007 Exactly Solved Models in Statistical Mechanics Dover books on physics (Dover Publications)

[28]

Babaev E, Sudbø A and Ashcroft N W 2004 Nature 431 666–668

[29]

Svistunov B V, Babaev E S and Prokof’ev N V 2015 Superfluid states of matter (Crc Press)

[30]

Bighin G, Defenu N, Nándori I, Salasnich L and Trombettoni A 2019 Physical Review Letters 123 164–6

[31]

Delfino G and Grinza P 2004 Nuclear Physics B 682 521 – 550 ISSN 0550-3213

[32]

Mussardo G 2010 Statistical field theory: an introduction to exactly solved models in statistical physics

[33]

Novoselov K S, Geim A K, Morozov S V, Jiang D, Zhang Y, Dubonos S V, Grigorieva I V and Firsov A A 2004 Science 306 666–669 ISSN 0036-8075

[34]

Baranov M A, Dalmonte M, Pupillo G and Zoller P 2012 Chemical Reviews 112 5012–5061

[35]

Gamble F R, DiSalvo F J, Klemm R A and Geballe T H 1970 Science 168 568–570 ISSN 0036-8075

[36]

Tinkham M 1996 Introduction to Superconductivity

[37]

Hannay N B, Geballe T H, Matthias B T, Andres K, Schmidt P and MacNair D 1965 Phys. Rev. Lett. 14 225–226

[38]

Ruggiero S T, Barbee T W and Beasley M R 1980 Phys. Rev. Lett. 45 1299–1302

[39]

Iazzi M, Fantoni S and Trombettoni A 2012 EPL (Europhysics Letters) 100 36007

[40]

Cazalilla M A, Iucci A and Giamarchi T 2007 Phys. Rev. A 75 051603

[41]

Swendsen R H and Wang J S 1987 Phys. Rev. Lett. 58 86–88

[42]

Salas J and Sokal A D 1996 Journal of Statistical Physics 85 297–361

[43]

Wolff U 1989 Phys. Rev. Lett. 62 361–364

[44]

Broecker P, Assaad F F and Trebst S 2017 arXiv.org

[45]

Thampi S P, Ansumali S, Adhikari R and Succi S 2013 Journal of Computational Physics 234 1–7

[46]

Onsager L 1944 Phys. Rev. 65 117–149

[47]

Oitmaa J and Enting I G 1975 Journal of Physics A: Mathematical and General 8 1097–1114

[48]

Hansen P L, Lemmich J, Ipsen J H and Mouritsen O G 1993 Journal of Statistical Physics 73 723–749 ISSN 1572-9613

[49]

Brower R, Orginos K, Shen Y and Tan C I 1995 Physica A: Statistical Mechanics and its Applications 221 554 – 564 ISSN 0378-4371

[50]

Delfino G and Mussardo G 1998 Nuclear Physics B 516 675 – 703 ISSN 0550-3213

[51]

Fabrizio M, Gogolin A and Nersesyan A 2000 Nuclear Physics B 580 647 – 687 ISSN 0550-3213

[52]

Tsvelik A M 2007 Quantum field theory in condensed matter physics (Cambridge university press)

[53]

Konik R M and Adamov Y 2009 Phys. Rev. Lett. 102 097203

[54]

Schneider T and Singer J 1998 Phase transition approach to high temperature superconductivity

[55]

Smiseth J, Smørgrav E, Babaev E and Sudbø A 2005 Phys. Rev. B 71 12–41

[56]

Sellin K A H and Babaev E 2016 Phys. Rev. B 93 503–5

[57]

Wu F Y 1982 Rev. Mod. Phys. 54 235–268

[58]

Hornik K 1991 Neural Networks 4 251–257

[59]

Martiniani S, Chaikin P M and Levine D 2019 Phys. Rev. X 9 011031–13

[60]

Riggs S C, Shapiro M C, Maharaj A V, Raghu S, Bauer E D, Baumbach R E, Giraldo-Gallo P, Wartenbe M and Fisher I R 2015 Nature Communications 6 2727–6

[61]

Lee M H, Chang C P, Huang F T, Guo G Y, Gao B, Chen C H, Cheong S W and Chu M W 2017 Phys. Rev. Lett. 119 157601–6

[62]

Varma C M and Zhu L 2006 Phys. Rev. Lett. 96 1265–4

[63]

Lupo C and Ricci-Tersenghi F 2017 Phys. Rev. B 95(5) 054433 URL https://link.aps.org/doi/10.1103/PhysRevB.95.054433

[64]

Lu M, Burdick N Q and Lev B L 2012 Phys. Rev. Lett. 108 215301

[65]

Park J W, Will S A and Zwierlein M W 2015 Phys. Rev. Lett. 114 205302–5

[66]

Baier S, Petter D, Becher J H, Patscheider A, Natale G, Chomaz L, Mark M J and Ferlaino F 2018 Phys. Rev. Lett. 121 093602

[67]

Allenspach R and Bischof A 1992 Phys. Rev. Lett. 69 3385–3388

[68]

Kashuba A and Pokrovsky V L 1993 Phys. Rev. Lett. 70 3155–3158

[69]

Kashuba A B and Pokrovsky V L 1993 Phys. Rev. B 48 10335–10344

[70]

Parker C V, Aynajian P, da Silva Neto E H, Pushp A, Ono S, Wen J, Xu Z, Gu G and Yazdani A 2010 Nature 468 677–680

[71]

Tanatar M A, Böhmer A E, Timmons E I, Schütt M, Drachuck G, Taufour V, Kothapalli K, Kreyssig A, Bud’ko S L, Canfield P C, Fernandes R M and Prozorov R 2016 Phys. Rev. Lett. 117 127001

[72]

Mendoza-Coto A, Barci D G and Stariolo D A 2017 Phys. Rev. B 95 175

[73]

Mendoza-Coto A and Stariolo D A 2012 Phys. Rev. E 86 85

[74]

Barci D G and Stariolo D A 2009 Phys. Rev. B 79 85

[75]

Barci D G and Stariolo D A 2011 Phys. Rev. B 84 175

[76]

Došilović F K, Brčić M and Hlupić N 2018 Explainable artificial intelligence: A survey 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO) (IEEE) pp 0210–0215

[77]

Rem B S, Käming N, Tarnowski M, Asteria L, Fläschner N, Becker C, Sengstock K and Weitenberg C 2019 Nature Physics 1

[78]

Bohrdt A, Chiu C S, Ji G, Xu M, Greif D, Greiner M, Demler E, Grusdt F and Knap M 2018 arXiv.org

[79]

Zhang Y, Mesaros A, Fujita K, Edkins S D, Hamidian M H, Ch’ng K, Eisaki H, Uchida S, Davis J C S, Khatami E and Kim E A 2018 arXiv.org

[80]

Koch-Janusz M and Ringel Z 2018 Nature Physics 14 578

[81]

Scherer D, Müller A and Behnke S 2010 Evaluation of pooling operations in convolutional architectures for object recognition International conference on artificial neural networks (Springer) pp 92–101

[82]

Nair V and Hinton G E 2010 Rectified linear units improve restricted Boltzmann machines Proceedings of the 27th international conference on machine learning (ICML-10) pp 807–814

[83]

Kingma D P and Ba J 2015 Adam: A method for stochastic optimization 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

[84]

Hinton G E, Srivastava N, Krizhevsky A, Sutskever I and Salakhutdinov R R 2012 arXiv preprint arXiv:1207.0580

[85]

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D G, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y and Zheng X 2016 Tensorflow: A system for large-scale machine learning 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) pp 265–283

Appendix A Details on the architecture and on the training of the convolutional neural network

The first layer is a convolutional layer with 32 filters of size 2 by 2 and unit stride in both directions. Then the ‘max pooling’ [81] operation is applied with pool size 3 by 3, stride 2 in both directions and same padding. The results is then fully connected to a hidden layer with 100 neurons. The binary classification is finally done in the output softmax layer with two neurons. Both the convolutional and hidden fully connected layers are activated by rectified linear units (ReLU) [82]. The network is visualized in Fig. 5.

We train the network by minimizing the cross-entropy using the Adam [83] adaptive optimization algorithm with 7 epochs and minibatch size 25. Such choice leads to a fast training – the amount of training is much lower than in computer vision applications, routinely requiring hundreds or thousands of epochs – as well as prevention of overfitting by early stopping, hence eliminating the need for other measures such as dropout [84]. We use the following Adam algorithm parameters: learning rate $0.001$ and standard choices of $\beta_{1}=0.9$ and $\beta_{2}=0.999$ . We use Tensorflow [85] for the implementation.

Appendix B Finite-size scaling and the error of the proposed method

The phase transitions in the systems we consider have a certain ‘natural’ width, due to the finite size of the lattice in the underlying Monte Carlo simulations; moreover, we expect our approach to introduce an additional width when determining the transition point. In order to verify this assumption and to investigate the accuracy of our method, we analyzed the natural width associated to the phase transition in the Ashkin-Teller model for $\beta K=0.7$ , determining it by looking at the peak of magnetic susceptibility directly from Monte Carlo simulations, and determining its width through the full width at half maximum (FWHM). We compare it with the FWHM of the Laplacian peak we reconstruct from our machine learning approach. The results are shown in Fig. 6; the FWHM of both the magnetic susceptibility (red squares, the red dotted line guides the eyes) and machine learning Laplacian (blue squares, the red blue line guides the eye) obey the same $\propto 1/L$ scaling with respect to the lattice size $L$ . The constant offset between the two datasets can be understood, as anticipated, as the additional error introduced of our method, due to the discretization of the parameter space, and due to some intrinsic uncertainty associated to the machine learning procedure.

Bibliography85

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Carleo G, Cirac I, Cranmer K, Daudet L, Schuld M, Tishby N, Vogt-Maranto L and Zdeborová L 2019 ar Xiv.org
2[2] Baldi P, Sadowski P and Whiteson D 2014 Nature communications 5 4308
3[3] Laptev I, Marszałek M, Schmid C and Rozenfeld B 2008 Learning realistic human actions from movies CVPR 2008-IEEE Conference on Computer Vision & Pattern Recognition (IEEE Computer Society) pp 1–8
4[4] Esteva A, Kuprel B, Novoa R A, Ko J, Swetter S M, Blau H M and Thrun S 2017 Nature 542 115
5[5] Farfade S S, Saberian M J and Li L J 2015 Multi-view face detection using deep convolutional neural networks Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ACM) pp 643–650
6[6] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al. 2015 International journal of computer vision 115 211–252
7[7] Carrasquilla J and Melko R G 2017 Nature Physics 13 431–434
8[8] van Nieuwenburg E P L, Liu Y H and Huber S D 2017 Nature Physics 13 435–439