Learning a Local Symmetry with Neural-Networks

Aur\'elien Decelle; Victor Martin-Mayor; Beatriz Seoane

arXiv:1904.07637·cond-mat.dis-nn·November 13, 2019

Learning a Local Symmetry with Neural-Networks

Aur\'elien Decelle, Victor Martin-Mayor, Beatriz Seoane

PDF

1 Repo

TL;DR

This paper demonstrates how neural networks can be designed to detect complex local symmetries, specifically Z2 gauge symmetry, and learn compressed representations relevant for physical and computational problems.

Contribution

It introduces a neural network architecture and dataset tailored to learn Z2 gauge symmetry and captures key features like Polyakov loops affecting computational complexity.

Findings

01

Neural networks can learn local gauge symmetries from data.

02

The method captures key physical features such as Polyakov loops.

03

The approach enables compressed latent representations of gauge orbits.

Abstract

We explore the capacity of neural networks to detect a symmetry with complex local and non-local patterns : the gauge symmetry Z 2 . This symmetry is present in physical problems from topological transitions to QCD, and controls the computational hardness of instances of spin-glasses. Here, we show how to design a neural network, and a dataset, able to learn this symmetry and to find compressed latent representations of the gauge orbits. Our method pays special attention to system-wrapping loops, the so-called Polyakov loops, known to be particularly relevant for computational complexity.

Tables5

Table 1. Table 1: The autoencoder as a classifier. Fraction of not-trivially-one couplings that are different in the “comb-gauge” output of the AE as applied to two instances { 𝑱 , 𝑱 ′ } 𝑱 superscript 𝑱 ′ \{\bm{J},\bm{J}^{\prime}\} from: 𝑱 = 𝑱 ′ 𝑱 superscript 𝑱 bold-′ \bm{J}=\bm{J^{\prime}} [ p J , J ) p^{J,J)} ], 𝑱 ′ = R q = 0.5 ( 𝑱 ′ ) superscript 𝑱 ′ subscript 𝑅 𝑞 0.5 superscript 𝑱 ′ \bm{J}^{\prime}=R_{q=0.5}(\bm{J}^{\prime}) [ p J , J ′ superscript 𝑝 𝐽 superscript 𝐽 ′ p^{J,J^{\prime}} ], 𝑱 ′ = R q = 0.1 ( 𝑱 ) superscript 𝑱 ′ subscript 𝑅 𝑞 0.1 𝑱 \bm{J}^{\prime}=R_{q=0.1}(\bm{J}) [ p J , R q ( J ) superscript 𝑝 𝐽 subscript 𝑅 𝑞 𝐽 p^{J,R_{q}(J)} ] or 𝑱 ′ = L ( 𝑱 ) superscript 𝑱 ′ 𝐿 𝑱 \bm{J}^{\prime}=L(\bm{J}) [ p J , L ( J ) superscript 𝑝 𝐽 𝐿 𝐽 p^{J,L(J)} ]. 𝑱 ′ superscript 𝑱 ′ \bm{J}^{\prime} is gauge-transformed (with random ϵ bold-italic-ϵ \bm{\epsilon} ) previously to the AE analysis. The AE was trained with N S subscript 𝑁 𝑆 N_{S} instances, randomly extracted from N 0 subscript 𝑁 0 N_{0} orbits. The results were computed from 1000 pairs { 𝑱 , 𝑱 ′ } 𝑱 superscript 𝑱 ′ \{\bm{J},\bm{J}^{\prime}\} , with 𝑱 𝑱 \bm{J} extracted from orbits not in the training set.

$L$	$N_{s}$	$N_{𝒪}$	$p^{J, J}$	$p^{J, J^{'}}$	$p^{J, R_{q = 0.1} (J)}$	$p^{J, L (J)}$
$5$	100k	1k	$\sim 3 %$	$\sim 52 %$	$\sim 30 %$	$\sim 21 %$
$6$	400k	1k	$\sim 3.5 %$	$\sim 54 %$	$\sim 27 %$	$\sim 17.5 %$
$8$	800k	4k	$\sim 3.1 %$	$\sim 52 %$	$\sim 30 %$	$\sim 10.5 %$

Table 2. Table 2: Architecture used for the simple classifier for gauge-not gauge pairs; conv stands for convolutional and FF for feed-forward.

LayerName	Input	LayerType	Activation	Nb of units	Kernel
Simple DCNN
CSq1	$𝑱$	conv	ReLu	$64$	$3 \times 3$
CLh	$𝑱$	conv	ReLu	$64$	$L_{x} \times 1$
CLv	$𝑱$	conv	ReLu	$64$	$1 \times L_{y}$
Dense1	[CSq1,CL1,CC1]	FF	ReLu	$64$
Dense2	Dense1	FF	sigmoid	$1$

Table 3. Table 3: A typical architecture used for the autoencoder, FF stands for feed-forward and UpS for upsampling, conv for convolutional, ReLu for Rectified Linear unit.

LayerName	Input	LayerType	Activation	Nb of units	Kernel
AutoEncoder
CSq1	J	conv	ReLu	$32$	$3 \times 3$
CL1	J	conv	ReLu	$16$	$L_{x} \times 1$
CC1	J	conv	ReLu	$16$	$1 \times L_{y}$
LatentRepr	[CSq1,CL1,CC1]	FF	ReLu	$50$ ( $= 5.5.2$ )
ConvDec1	LatentRepr	conv	ReLu	$64$	$3 \times 3$
UpS	ConvDec1	UpS			$2 \times 2$
CSq2	UpS	conv	ReLu	$32$	$3 \times 3$
CH2	UpS	conv	ReLu	$32$	$L_{x} \times 1$
CV2	UpS	conv	ReLu	$32$	$1 \times L_{y}$
ConvDec	[CSq2,CH2,CV2]	conv	Linear	$1$	$5 \times 5$

Table 4. Table 4: Results for the autoencoder for the size L = 5 𝐿 5 L=5 . We observe a clear gap when samples came from the same orbit gauge with respect to even a small alteration (such as flipping a small fraction of coupling or a line).

Same Orbit	Diff. Orbit (Line)	Diff. Orbit ( $q = 0.1$ )	Random
$\sim 3$ %	$\sim 21 %$	$\sim 30 %$	$\sim 50 %$

Table 5. Table 5: Architecture used for the classifier of latent representations (created by the autoencoder). FF stands for feed-forward and UpS for upsampling.

LayerName	Input	LayerType	Activation	Nb of units	Kernel
Enc-Classif
Concat	[LatentRepr( $J_{1}$ ),LatentRepr( $J_{2}$ )]
Conv1	Concat	conv	ReLu	$16$	$32$
MaxP1	Conv1	pooling	MaxPooling		$2$
Conv2	MaxP1	conv	ReLu	$32$	$16$
MaxP1	Conv2	pooling	MaxPooling		$2$
D1	MaxP1	FF	ReLu	32
Out	D1	FF	Softmax	2

Equations4

H = - ⟨ x, y ⟩ \sum J_{x y} σ_{x} σ_{y}, (σ_{x} = \pm 1 for all sites x),

H = - ⟨ x, y ⟩ \sum J_{x y} σ_{x} σ_{y}, (σ_{x} = \pm 1 for all sites x),

J_{x y} \to \tilde{J}_{x y} = J_{x y} ϵ_{x} ϵ_{y},

J_{x y} \to \tilde{J}_{x y} = J_{x y} ϵ_{x} ϵ_{y},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AurelienDecelle/SpinLearning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Learning a Local Symmetry with Neural-Networks

A. Decelle

Laboratoire de Recherche en Informatique, TAU - INRIA, CNRS, Université Paris-Sud et Université Paris-Saclay, Bât. 660, 91190 Gif-sur-Yvette, France

V. Martin-Mayor

Departamento de Física Teórica, Universidad Complutense, 28040 Madrid, Spain

Instituto de Biocomputación y Física de Sistemas Complejos (BIFI), 50018 Zaragoza, Spain

B. Seoane

Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France

Sorbonne Université, Institut des Sciences, du Calcul et des Données (ISCD), 75005 Paris, France

Abstract

We explore the capacity of neural networks to detect a symmetry with complex local and non-local patterns : the gauge symmetry $Z_{2}$ . This symmetry is present in physical problems from topological transitions to QCD, and controls the computational hardness of instances of spin-glasses. Here, we show how to design a neural network, and a dataset, able to learn this symmetry and to find compressed latent representations of the gauge orbits. Our method pays special attention to system-wrapping loops, the so-called Polyakov loops, known to be particularly relevant for computational complexity.

The physics community is now greatly excited by the possibilities offered by machine learning tools, which have reached superhuman performance in tasks of significant complexity (think, for instance, of Go playing Silver et al. (2016)). Indeed, deep (convolutional) neural networks (DCNN) LeCun et al. (2015); Schmidhuber (2015), initially developed for classification and pattern recognition tasks, have been applied to the identification of phases of matter Torlai and Melko (2016); Carrasquilla and Melko (2017); Wetzel and Scherzer (2017); van Nieuwenburg et al. (2017); Wang (2016); Ohtsuki and Ohtsuki (2017); Beach et al. (2018), including glasses Schoenholz et al. (2016, 2017); Cubuk et al. (2015) and topological states Deng et al. (2017), or even to seemingly for-humans-only tasks, such as finding real-space renormalization group transformations Koch-Janusz and Ringel (2018) (this is just a somewhat arbitrary selection of, literally, hundreds of applications to physics).

In this context, local -or gauge- symmetries pose a major challenge due to the absence of any local or global order parameter Elitzur (1975), which explains why only preliminary studies have been conducted Carrasquilla and Melko (2017); Wetzel and Scherzer (2017) (we add to this list). In fact, thanks to their convolutional layers, DCNN successfully handle locally symmetries such as global translations and rotations: even if moved, DCNN still identify a previously learned imaged. Therefore, the obvious next step for Physicists is to consider more general symmetries for practical purposes.

The specific question we had in mind was whether or not DCNNs could be used to predict the computational complexity of a particular instance of an optimization problem. Spin glasses represent the perfect playground to test this idea, because finding the ground state of a simple Hamiltonian such as:

[TABLE]

is an NP-complete problem as soon as the underlying interaction-graph is non-planar Barahona (1982); Istrail (2003) (we shall consider statistically independent couplings $J_{\bm{x}\bm{y}}=\pm 1$ with $50\%$ probability). The classification problem is motivated because the computational difficulty of solving different problem instances of Eq. (1) spreads over several orders of magnitude Alvarez Baños et al. (2010); Fernández et al. (2013); Billoire (2014); Martín-Mayor and Hen (2015); Fernandez et al. (2016); Billoire et al. (2018), even for such a modest number of spins as $N\sim 500$ 111Actually, Refs. Alvarez Baños et al. (2010); Fernández et al. (2013); Billoire (2014); Fernandez et al. (2016); Billoire et al. (2018) attempted to find equilibrium configurations using a Parallel Tempering algorithm down to some minimal temperature $T_{\mathrm{min}}$ . In order to compute the Ground State, one needs to push $T_{\mathrm{min}}$ to zero, as done for instance in Ref. Martín-Mayor and Hen (2015). Unfortunately, the lower $T_{\mathrm{min}}$ the larger the spread over the samples of the computational hardness, see e.g. Refs. Alvarez Baños et al. (2010); Fernandez et al. (2016); Billoire et al. (2018).. In spite of the question’s practical relevance, it is still unknown which features of the coupling matrix $J_{\bm{x}\bm{y}}$ cause this tremendous disparity of computational cost Billoire (2014). DCNNs would be an obvious choice to address the computational-cost classification problem, were it not for the gauge symmetry of Hamiltonian (1) (the $\epsilon_{\bm{x}}=\pm 1$ are arbitrary) Toulouse (1977)

[TABLE]

All problem instances related by this transformation belong to the same gauge orbit. Now, the difficulty for solving problems from the same gauge orbit is identical. Hence, our dreamed DCNN should first be able of telling us with certainty whether or not two problem instances belong to the same gauge orbit.

Here we present a machine-learning algorithm that solves the problem of gauge-orbit identification as formulated for spin glasses on the square lattice. The same algorithm works in the cubic lattice, although we are limited to systems of smaller linear size due to the memory and computational costs. Interestingly, all the standard DCNNs for image classification tried, including the ResNet He et al. (2016), completely failed at this task. A careless posing of the problem could make it wrongly seem trivial. Indeed, instances from the same orbit share the value of every Wilson loop Montvay and Münster (1997) [the product of couplings along a closed loop in the lattice, which is gauge-invariant (2)]. Attention immediately falls on the plaquette, the shortest Wilson loop, see e.g. Carrasquilla and Melko (2017) or Fig. 1–left. However, two instances sharing the value of every plaquette, but differing on the so-called Polyakov loops (the shortest Wilson loops wrapping the system thanks to the periodic boundary conditions), may have vastly different computational complexity Fernandez et al. (2016). We improve over Ref. Carrasquilla and Melko (2017) by teaching our machine to consider both local and non-local Wilson loops when studying a $Z_{2}$ gauge symmetry.

Let us highlight two other aspects of this problem that machine-learning practitioners may find attractive: (i) a training set of (essentially) arbitrary size can be easily generated and (ii) an algorithm of polynomial complexity provides an exact answer to the question of whether two problem instances belong to the same gauge orbit.

Below, we present two different approaches to solve this classification problem using DCNN (we employed the Keras-tensorflow and scikit-learn libraries Chollet et al. (2015); Pedregosa et al. (2011)). Our first algorithm tells us if two problem instances are in the same gauge orbit. Our second algorithm is an autoencoder, a DCNN capable of finding a latent representation of a gauge orbit by means of an approximate gauge-fixing. Although the latent representation can be used for classification purposes as well, its strength is in that it clusters problem instances by orbits.

For square lattices, it is natural to feed the coupling matrix $\bm{J}$ to the neural network as an image. After considering several alternatives, our choice was to map our physical square lattice of size $L$ to a square image of size $2L$ through the chess transformation illustrated in Fig. 1–left (the chess-transformation generalizes to 3D). Although one pixel out of two is wasted in the resulting image, we found that the learning process and the interpretation of results were easier with the chess transformation than with less memory-demanding representations.

Gauge transformations are also illustrated in Fig. 1–right: the naked eye can hardly tell whether or not the images corresponding to two coupling-matrices belong to the same gauge orbit. This question can be answered by fixing the gauge 222In this work we deal with an Abelian gauge group which makes fixing the gauge simple (difficulties arise for non-Abelian gauge groups, see e.g. Ref. Marinari et al. (1991))., that is, to use a map $f_{\mathcal{G}}:\bm{J}^{\mathcal{O}_{k}}\to{\bm{\hat{J}}}^{\mathcal{O}_{k}}$ from any instance $\bm{J}$ from gauge orbit ${\mathcal{O}_{k}}$ to a single representative of it, $\bm{\hat{J}}$ . Thus, two instances are in the same orbit if, and only if, $f_{\mathcal{G}}(\bm{J})=f_{\mathcal{G}}(\bm{J^{\prime}})$ . We construct our mapping by changing the gauge: the $\bm{\epsilon}\equiv\left\{{\epsilon_{\bm{x}}}\right\}$ in Eq. (2) are chosen in such a way that $\tilde{J}_{\bm{x},\bm{y}}=1$ for any horizontal coupling $\bm{x}-\bm{y}=(\pm 1,0)$ (but for $\tilde{J}_{\bm{x}=(L-1,y),\bm{y}=(0,y)}$ which is equal to a gauge-invariant Polyakov loop), as well as $\tilde{J}_{\bm{x}=(0,y),\bm{y}=(0,y+1)}=1$ for $0\leq y<L-2$ . We include a code performing this gauge-fixing in the Appendices.

Construction of the data set– We found inconvenient for our purposes the approach used in Ref. Carrasquilla and Melko (2017) to detect the gauge symmetry, namely constructing a (balanced) dataset of pairs of systems, a group with pairs of instances from the same orbit an the other group with pairs of randomly-chosen $\bm{J}$ s. Indeed, this classification problem is too easy. Most of the time, and this is what the DCNN will learn, the pair of randomly-chosen $\bm{J}$ s will be so different that one could tell that they do not belong to the same orbit just by looking at a very reduced number of plaquettes 333For two randomly-chosen $\bm{J}$ s, the probability of coincidence in $k$ fixed, non-overlapping plaquettes falls as $1/2^{k}$ .. A DCNN trained in this way would completely miss situations in which just a few coupling changed, and it would be blind to extensive transformations that leave every plaquettes unaltered. Therefore, we need to ensure that in our dataset it will not be enough for the DCNN to check one (or few) plaquette(s) [neither fixed plaquettes nor randomly chosen ones].

Specifically, our data-set is composed of $N_{\mathrm{s}}$ pairs $\{\bm{J},\bm{J}^{\prime}\}$ . The $\bm{J}$ is random (with uniform distribution). For half of the $N_{\mathrm{s}}$ pairs, $\bm{J}^{\prime}=\bm{J}$ . In the other half, $\bm{J}^{\prime}$ is obtained from $\bm{J}$ by some transformation (see below and Appendices) that changes only a small fraction of the couplings $J_{\bm{x},\bm{y}}$ . For all pairs, $\bm{J}^{\prime}$ is gauge-transformed (with random $\left\{{\epsilon_{\bm{x}}}\right\}$ ) before being fed to the DCNN.

In the so-called $\bm{J}^{\prime}=R_{q}(\bm{J})$ transformation, a fraction $q$ of randomly-chosen $J_{\bm{x}\bm{y}}$ is flipped.

In the (horizontal) line-transformation $\bm{J}^{\prime}=L(\bm{J})$ , $\bm{J}^{\prime}$ is obtained from $\bm{J}$ by flipping the couplings joining $\bm{x}=(0,y)$ and $\bm{y}=(1,y)$ for any $y$ (vertical transformation: $\bm{x}=(x,0)$ and $\bm{y}=(x,1)$ , for all $x$ ). Every plaquette in the lattice take the same value in $\bm{J}$ and $\bm{J}^{\prime}$ , but the sign of all their horizontal (vertical) Polyakov loops is opposite. These line transformations 444Any other transformation can be expressed as a combination of broken plaquette(s) and/or line(s)., are important when assessing the computational hardness Fernandez et al. (2016).

In our data set, we choose with 1/3 probability $\bm{J}^{\prime}=L(\bm{J})$ or, with probability $2/3$ , $\bm{J}^{\prime}=R_{q}(\bm{J})$ . Line transformations are equally likely to be horizontal or vertical. If the chosen transformation is $R_{q}$ , in order to force the scan of every plaquette, we pick $q\sim 1/L^{2}$ with $50\%$ probability (we invert randomly $1\!-\!5$ couplings), or $q=q_{R}$ where $q_{R}$ is an uniform random number with $1/(2L^{2})\leq q_{R}<1/4$ .

Construction of the DCNN– We aim to build a DCNN that inputs the chess-transformed (see Fig. 1–left) images representing a pair of coupling matrices $\{\bm{J},\bm{J}^{\prime}\}$ and outputs the probability that the two instances belong to the same gauge-orbit.

The Euclidean geometry of our problem suggests to use convolutional neural networks (CNN) Fukushima (1980); LeCun et al. (1989); Krizhevsky et al. (2012), which are well adapted to translational symmetry. Specifically, we combine in parallel three CNNs that scan simultaneously the plaquettes, (square in Fig. 2–top), and the Polyakov loops, scanned through horizontal and vertical $1\times L$ slabs (rectangles in Fig. 2–top). The first CNN allows us to find quickly small defects in the gauge symmetry, while the other two search for non-local defects. These three CNNs serve as feature detectors before a fully-connected layer that performs the classification. We illustrate on Fig. 2–top the general architecture of our DCNN (the number of layers and the size of the dense layer vary with $L$ ). Additional details, as well as sample programs, can be found in the Appendices.

Results for the classifying DCNN– For our data set, we manage to obtain almost $100\%$ of accuracy on linear sizes of $L=5,10$ . In other words, even for our very exigent data set, the DCNN learns to tell whether or not two problem instances really are the same problem in disguise.

However, let $N_{s}(p)$ be the size of the training set needed to reach a target accuracy $p$ . We see in Fig. 3 that $N_{s}(p)$ is much smaller in the training set that in the test set (problem instances in the test set are new to the DCNN). Furthermore, $N_{s}(p)$ grows significantly with $L$ .

We have found that the difficulty of the problem is largely caused by the Polyakov-loop flipping line-transformations. More details on this analysis can be found in the Appendices.

Learning to fix the gauge– Gauge-fixing may be regarded as an algorithm to reduce the dimensionality of the coupling matrix $\bm{J}$ with no information loss. Hence, it is natural to ask ourselves if a particular type of DCNN, an auto-encoder (AE) Rumelhart et al. (1985); Ballard (1987), may learn to fix the gauge. Indeed, an AE takes an input vector $\bm{x}$ and maps it to a latent representation $f_{\mathcal{E}}(\bm{x})$ (typically, $f_{\mathcal{E}}(\bm{x})$ is of smaller dimensionality than $\bm{x}$ ). A decoder generates a reconstructed vector from the latent representation afterwards, $\bm{x}^{\prime}=f_{\mathcal{D}}(f_{\mathcal{E}}(\bm{x}))$ . The weights of the encoder $f_{\mathcal{E}}$ and the decoder $f_{\mathcal{D}}$ functions are chosen to minimize a loss function (e.g. the $L_{2}$ distance between $\bm{x}$ and $\bm{x}^{\prime}$ ).

At variance with the traditional approach, we will not ask our AE to reconstruct the input but to fix the gauge, that is to reconstruct a unique $\bm{\hat{J}}$ (the comb gauge described above) for all the instances in a given gauge orbit.

Our encoder will essentially share the architecture of our classifying DCNN (namely, the three CNNs of Fig. 2 without the classification layer). The decoder takes the encoder’s output, and pipes it to an upsampling layer, followed by our three feature detector CNNs and by a last CNN from which we take the output (more details can be found in the Appendices). The output from a given coupling matrix $\bm{J}$ is an attempted reconstruction of its comb-gauge representation (Fig. 1–right).

The AE can be used as a classifier simply by comparing the “comb-gauge” obtained from two problem instances. As shown in Table 1, only pairs of instances from the same orbit have a similar “comb-gauge” (the performance does not deteriorate when the system size increases).

We can gain some understanding by visualizing the latent representation, see Fig. 4. Indeed the AE’s latent representation clusters problem instances belonging to the same orbit. Furthermore, not only the representation for two problems from the same orbit is nearly identical: changing a few links or performing a line transformation results into a significantly different latent representation.

Conclusions– We have demonstrated a successful machine learning approach to detect whether or not two spin-glass instances are mutually related by a gauge transformation. This problem is particularly challenging for neural networks due to the absence of an order parameter. In fact, we have checked the failure of the standard DCNNs for image classification, such as pre-trained DCNNs, no matter the size of the training set. Our results underline the necessity of carefully choosing the learning dataset, if we want the DCNN to learn the full symmetry (which includes global Wilson loops). We show that our DCNNs are able to learn the gauge symmetry and even to find a latent representation that can be used to fix the gauge. This success comes at the cost of very large training datasets, whose size need to grow with the system size. Now that we have in our hands DCNNs able to identify gauge symmetries, we will approach our original question, namely what makes certain problem instances far more computationally costly than others?

Acknowledgements.

We thank L. A. Fernández for encouraging discussions and Marco Baity-Jesi for his careful reading of the manuscript. This work was partially supported by Ministerio de Economía, Industria y Competitividad (MINECO) (Spain) and by EU’s FEDER program through Grant No. FIS2015-65078-C2 and by the LabEx CALSIMLAB (public grant ANR-11-LABX-0037-01 constituting a part of the “Investissements d’Avenir” program - reference : ANR-11-IDEX-0004-02).

Appendix A Sample generation and basic transformations

The first step to build our dataset is to create independent realizations of the disorder $\bm{J}$ (what we call sample). The generation codes for all the functions mentioned below can be downloaded from file src/tools.py in Ref. Decelle and Seoane (2019).

•

Generation of a random sample $\bm{J}$ : A random sample $\bm{J}$ is generated by assigning a random sign ( $\pm 1$ ) to each of the $2L_{x}L_{y}$ couplings in the two dimensional lattice system. The code to create a sample can be found in function createSample_2D.

In addition, we consider 4 possible transformations of these samples (all of them are illustrated in the first row of Fig. 5):

•

Gauge fixing $\mathrm{G}(\bm{J})$ : we map our sample $\bm{J}$ to its comb-gauge representative. To do so, we use the gauge transformation explained in Eq. (2) of the main-text. Specifically, we fix to one (black in our color code) all the couplings in the horizontal direction, as well as the couplings in the first vertical column. However, the last coupling along each direction cannot be fixed due to the boundary conditions. The code to fix the gauge can be found in function gauge_fixing_Comb.

•

Random orbit $\mathrm{O}(\bm{J})$ : We use Eq. (2) of the main-text to generate a random representative of the gauge-orbit to which $\bm{J}$ belongs. Specifically, we generate $L_{x}L_{y}$ random signs $\epsilon_{\bm{x}}$ and set $J^{\prime}_{\bm{x}\bm{y}}=J_{\bm{x}\bm{y}}\epsilon_{\bm{x}}\epsilon_{\bm{y}}$ . The code that performs the random-orbit transformation can be found in function getOrbit_2D.

•

Random flip-coupling - $\mathrm{R}_{q}(\bm{J})$ : We invert the sign of a fraction $q$ of the $2L_{x}L_{y}$ couplings in the system. The corresponding code can be found in function getRandom_2D.

•

Line transformation - $\mathrm{L}(\bm{J})$ : We invert the sign of an horizontal or vertical line of non-connected couplings (see Fig. 5). The code to generate this transformation is in function getLine_2D.

One can consider more general transformations, like flipping a random connected line (not necessary straight) or a random loop of couplings in the system (codes can be found in getRandomLine and getLoop functions). All them can be decomposed as a combination of the previous 4 transformations. We did not find any particular advantage to include them in the dataset for the learning, but we checked that our trained machine classifies them correctly.

In order to distinguish between transformations that conserve the gauge orbit [here, $\mathrm{G}(\bm{J})$ and $\mathrm{O}(\bm{J})$ ], from those that modify the orbit [namely, $\mathrm{R}_{q}(\bm{J})$ and $\mathrm{L}(\bm{J})$ ], one needs to compute the Wilson loops, as shown in Fig. 5). In particular, we note that the $\mathrm{L}(\bm{J})$ transformation is particularly difficult to detect since this transformation conserves all the plaquettes, and the broken loops can be only detected through the Polyakov loops.

Appendix B Additional details on the classifier DCNN gauge/not gauge

The classifier aims to classify whether or not pairs of samples $\{\bm{J}$ , $\bm{J^{\prime}}\}$ belong to the same gauge orbit. We begin with the construction of our dataset.

B.1 Dataset

We consider $N_{\text{s}}=2M$ pairs $\{\bm{J}$ , $\bm{J^{\prime}}\}$ . In all cases, the original sample $\{\bm{J}\}$ is chosen randomly (with uniform probability). We refer to Section A for the definition of the transformations.

•

Class 1: $M$ pairs are taken from the same gauge orbit, $\bm{J^{\prime}}=\mathrm{O}(\bm{J})$ .

•

Class 2: $M$ pairs of samples $\{\bm{J}$ , $\bm{J^{\prime}}\}$ belonging to two different gauge orbits. This dataset is constructed as follows:

–

Quite different orbits G1: $M/3$ pairs with $\bm{J^{\prime}}=\mathrm{O}\left(\mathrm{R}_{q}(\bm{J})\right)$ ) with $q\in[1/(2L_{x}L_{y}),0.25]$ . This class ranges covers from samples with just one link flipped, to samples $\bm{J^{\prime}}$ where (almost) every plaquette has a chance to flip ( $q=0.25$ ).

–

Extremely similar orbits G2: $M/3$ pairs with $\bm{J^{\prime}}=\mathrm{O}\left(\mathrm{R}_{q}(\bm{J})\right)$ ) and $q\in[1/(2L_{x}L_{y}),5/(2L_{x}L_{y})]$ , so that only 1 to 5 links were inverted. Our motivation for introducing this group was forcing the machine to check every plaquette in the system.

–

Broken lines G3: $M/3$ pairs with $\bm{J^{\prime}}=\mathrm{O}\left(\mathrm{L}(\bm{J})\right)$ ). The line is horizontal or vertical with 50% of the probability.

An example of the generation of this dataset can be found in the notebook DCNN_simple.ipynb in Ref. Decelle and Seoane (2019).

B.2 Network

The structure of the neural network is illustrated in Fig. 2. We include the technical details of the network used in Table 2. We use the same architecture for all the $L$ and $N_{\mathrm{s}}$ discussed in the main-text. We include an example of the program used in DCNN_simple.ipynb in Ref. Decelle and Seoane (2019).

In order to avoid overfitting, and also to avoid getting stuck in not optimal minima during the learning process, we found useful to alternate between two optimizers, in particular, between stochastic gradient descent and Adam Kingma and Ba (2014). An example of the strategy followed can be found in Ref. Decelle and Seoane (2019).

B.3 Tests on the different groups of the dataset

Fig. 3 shows the overall accuracy of DCNN classifier, making no distiction about the G1, G2 and G3 groups in Section B.1. We provide this information, as obtained from pairs of samples in the test dataset, in Fig. 6. In particular, a comparison of Fig. 6–right (which corresponds to the line-transformed samples in group G3) with Fig. 3 in the main-text will convince the reader that the global accuracy of the machine is dominated by this group.

Appendix C Additional details on the autoencoder DCNN

The autoencoder aims to find a latent representation of the gauge-orbit by relating any sample to an unique representative of its gauge-orbit (namely the comb-gauge representative). With this purpose in mind, we built our dataset as explained in the next paragraph.

C.1 Dataset

We will consider separately $N_{\mathrm{g}}$ distinct gauge orbits, identified by one orbit representative. We construct the orbits in the following way:

•

$N_{\mathrm{g}}/2$ are generated as random samples $J$ (the probability that two random samples belong to the same orbit is negligible). We call this set $\mathcal{R}_{\mathrm{g}}$ .

•

$N_{\mathrm{g}}/4$ orbits were constructed by randomly selecting one $\bm{J}$ from set $\mathcal{R}_{\mathrm{g}}$ , and then setting as orbit-representative $\mathrm{R}_{q}(\bm{J})$ , with $q$ an uniform random number $q\in[1/(2L_{x}L_{y}),0.25]$ .

•

$N_{\mathrm{g}}/4$ orbits were constructed by randomly selecting one $\bm{J}$ from set $\mathcal{R}_{\mathrm{g}}$ , and then setting as orbit-representative $\mathrm{L}(\bm{J})$ .

We extract $N_{\mathrm{s}}$ distinct samples from each orbit by using the $O$ transformation, recall Section A. An example of the generation of this dataset can be found in the notebook AutoEncoder.ipynb in Ref. Decelle and Seoane (2019).

C.2 Network

The encoder is typically built upon the model from the main-text, see Fig. 2. The number of filters used for the convolutional layers do not need to be very high. For instance, $16$ filters are enough for a small lattice size (e.g. $L=5$ ). The results of the three parallel CNNs are concatenated and then connected to a dense network of size $L\times L\times N_{\rm latent}$ , where $N_{\rm latent}$ is adjusted depending on the system size (we remind here that the input of the encoder is of size $2L\times 2L$ because of the chess transformation). The decoder is then made of, first, a CNN and an upsampling layer in order to go back to the correct lattice size. Then again, our three parallel CNNs are stacked (square, vertical and horizontal kernel), taking as input the output of the upsampling layer. Their outputs are concatenated before a last CNN with a larger kernel (typically half of the system size). All the parameters here can, of course, be adjusted to obtain the best result possible for a given $L$ . However, in front of the wide variety of possible working parameters, we stuck to the above ones because changing parameters did not result into a great improvement. In table 3 we show an example of the architecture used for the $L=5$ case. An example of this neural network can be found in the notebook AutoEncoder.ipynb in Ref. Decelle and Seoane (2019).

C.3 Learning

The learning procedure was performed by using a linear activation for the last layer, together with a Minimum Square Error (MSE) loss function on all the nodes of the system. The MSE is computed between output of the autoencoder for the input $\bm{J}$ , and its comb-gauge representative $\mathrm{G}(\bm{J})$ . In principle, it would be possible to use as loss function a binary cross entropy, together with a $\tanh$ for the activation function, taking advange of the binary nature of the couplings. However, we did not find any improvement when using these parameters w.r.t. the others. We note as well that, because we use the chess transformation, the loss is defined on all the pixels, including the dummy ones. Neglecting dummy pixels, however, did not result in any improvement.

C.4 Tests

It is known that DNNs are prone to overfit the dataset. Hence, in order to be sure that the autoencoder did learn a general property, we perform several checks on a test set (i.e. a set of orbits not used to train the network) on our well trained machine. In general, we compare the output of the autoencoder (the reconstructed comb gauges) for two distinct input samples $\{\bm{J},\bm{J^{\prime}}\}$ . The comparison is done by counting the number of different couplings. We consider four diverse situations:

The two samples are from the same gauge orbit, i.e. $\bm{J^{\prime}}=\mathrm{O}(\bm{J})$ . 2. 2.

Two samples separated by a line and a gauge transformation, i.e. $\bm{J^{\prime}}=\mathrm{O}(L(\bm{J}))$ . 3. 3.

Two samples separated by a random-link and a gauge transformation, i.e. $\bm{J^{\prime}}=\mathrm{O}(\mathrm{R}_{q}(\bm{J}))$ . 4. 4.

Two random samples.

We show in Table 4 the results of these comparisons averaged over 1000 pairs of each situation. Outputs from samples in the same orbit are essentially equal (only a $\sim 3\%$ of the couplings are different). If the gauge-fixing were perfect, they should be strictly equal. However, a much larger difference is observed in the outputs of the rest of the cases. Notwithstanding, we would like to stress that we needed a large number of samples to be able to distinguish case no.1 from no.2. With a fewer numbers, outputs of test no.2 were essentially equal.

We add an additional test on the trained the network. We want to understand if the network manages to learn an (almost) unique representation for a given orbit. To do that, we use the $t-sne$ representation to project in two dimensions the high-dimensional latent space. If the network is able to cluster well the samples in distinct orbits (that is, if the network learned the gauge symmetry), the $t-sne$ transformation of different orbits should be well-separated. On Fig. 7 we illustrate the clustering generated by our trained autoencoder (for $L=5$ ) using as input the following group of test sets of $N_{\mathrm{s}}=200000$ samples each:

We generate $N_{\mathrm{g}}=200$ random orbits $\bm{J}$ , and take $100$ gauge transformations from each $\bm{J^{\prime}}=\mathrm{O}(\bm{J})$ . 2. 2.

We generate $N_{\mathrm{g}}=100$ random orbits $\bm{J}$ , and another $100$ orbits constructed applying the $\mathrm{R}_{q=0.1}(\bm{J})$ transformation to the $100$ random ones. Again, we take $100$ gauge transformations from each orbit. 3. 3.

We generate $N_{\mathrm{g}}=100$ random orbits $\bm{J}$ , and another $100$ orbits constructed by applying the $\mathrm{L}(\bm{J})$ transformation on the $100$ random ones. Again, we take $100$ gauge transformations from each orbit.

We show the result of the $t-sne$ two-dimensional representations of these three groups on Fig. 7. We clearly see very good clustering properties for all the groups, though the third case remains sometimes difficult.

C.5 Classifier based on the latent representations

Not very surprisingly, one can also train a neural-network to tell us whether two latent representations (generated by our trained autoencoder using two different samples) belong to the same gauge orbit or not, thus doing the job of our previous classifier (discussed in Section B). To do so, we concatenate the two latent representations and feed them to various CNNs and a classification layer. Various architectures worked there, we put one as an example in the notebook AutoEncoder.ipynb in Ref. Decelle and Seoane (2019), whose details are reproduced on Table 5. When the autoencoder is well-trained, the classifier quickly reaches an accuracy above $98\%$ .

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Silver et al. (2016) D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, Nature 524 , 484 (2016) . · doi ↗
2Le Cun et al. (2015) Y. Le Cun, Y. Bengio, and G. Hinton, Nature 521 , 436 (2015) . · doi ↗
3Schmidhuber (2015) J. Schmidhuber, Neural Networks 61 , 85 (2015) . · doi ↗
4Torlai and Melko (2016) G. Torlai and R. G. Melko, Phys. Rev. B 94 , 165134 (2016) . · doi ↗
5Carrasquilla and Melko (2017) J. Carrasquilla and R. G. Melko, Nature Physics 13 , 431 (2017).
6Wetzel and Scherzer (2017) S. J. Wetzel and M. Scherzer, Physical Review B 96 , 184410 (2017).
7van Nieuwenburg et al. (2017) E. van Nieuwenburg, Y.-H. Liu, and S. Huber, Nature Physics 13 , 435 (2017) . · doi ↗
8Wang (2016) L. Wang, Phys. Rev. B 94 , 195105 (2016) . · doi ↗