
TL;DR
This paper introduces a deep Boltzmann machine model for the AdS/CFT correspondence, linking neural networks with bulk spacetime geometry and providing a new computational framework for holography.
Contribution
It presents a novel neural network architecture that models the bulk spacetime in AdS/CFT, including black hole horizons and Einstein action regularization, bridging holography and machine learning.
Findings
DBM models bulk scalar fields in curved geometries
Training weights encode the bulk metric
Holographic renormalization implemented in autoencoder
Abstract
We provide a deep Boltzmann machine (DBM) for the AdS/CFT correspondence. Under the philosophy that the bulk spacetime is a neural network, we give a dictionary between those, and obtain a restricted DBM as a discretized bulk scalar field theory in curved geometries. The probability distribution as training data is the generating functional of the boundary quantum field theory, and it trains neural network weights which are the metric of the bulk geometry. The deepest layer implements black hole horizons, and an employed regularization for the weights is an Einstein action. A large limit in holography reduces the DBM to a folded feed-forward architecture. We also neurally implement holographic renormalization into an autoencoder. The DBM for the AdS/CFT may serve as a platform for studying mechanisms of spacetime emergence in holography.
| AdS/CFT | Deep Boltzmann machine |
|---|---|
| Bulk coordinate | Hidden layer label |
| QFT source | Input value |
| Bulk field | Hidden variables |
| QFT generating function | Provability distribution |
| Bulk action | Energy function |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
AdS/CFT as a deep Boltzmann machine
Koji Hashimoto
Department of Physics, Osaka University, Toyonaka, Osaka 560-0043, Japan
Abstract
We provide a deep Boltzmann machine (DBM) for the AdS/CFT correspondence. Under the philosophy that the bulk spacetime is a neural network, we give a dictionary between those, and obtain a restricted DBM as a discretized bulk scalar field theory in curved geometries. The probability distribution as training data is the generating functional of the boundary quantum field theory, and it trains neural network weights which are the metric of the bulk geometry. The deepest layer implements black hole horizons, and an employed regularization for the weights is an Einstein action. A large limit in holography reduces the DBM to a folded feed-forward architecture. We also neurally implement holographic renormalization into an autoencoder. The DBM for the AdS/CFT may serve as a platform for studying mechanisms of spacetime emergence in holography.
††preprint: OU-HET-1003
I Introduction
Deep Boltzmann machines dbm are a particular type of neural networks in deep learning Hinton ; Bengio ; LeCun for modeling probabilistic distribution of data sets. They are equipped with deep layers of units in their neural network architecture, and are a generalization of Boltzmann machines BMref which are one of the fundamental models of neural networks. Deepening the architecture enlarges the representation power of the models, and recent advances in training deep models in machine learning were initiated by analogues of the deep Boltzmann machines.
The neural network of a deep Boltzmann machine consists of visible units and hidden units. On those units binary variables live, and they interact with each other under a Hamiltonian called an energy function. Thus basically the deep Boltzmann machine is an Ising model in which spins only at a boundary layer are visible (observable), and the Hamiltonian allows inhomogeneity and nonlocality. For a given probability distribution of the observed spin configurations at the boundary layer, Ising bond strengths (called “weights”) in the model Hamiltonian are trained to approximate the given distribution; that is the deep learning of the deep Boltzmann machine. The training determines the weights automatically, and a structure of the Hamiltonian emerges. Efficient algorithms for the training EfficientDBM accelerated the progress in deep learning.
In this paper we study a relation between the deep Boltzmann machines and the AdS/CFT correspondence Maldacena:1997re ; Gubser:1998bc ; Witten:1998qj in quantum gravity. The AdS/CFT correspondence is a holographic duality between a -dimensional quantum gravity and a -dimensional quantum field theory (QFT) without gravity. The latter lives at the boundary of the gravitational spacetime of the former. From the viewpoint of the QFT, the direction perpendicular to the boundary surface is an “emergent” space direction. Therefore, the aforementioned structure of the deep Boltzmann machines suits the scheme of the AdS/CFT, once we identify their visible layers with the QFT, and the hidden layers as the bulk spacetime. See Fig. 1. The trained weights are interpreted as the metric function of the bulk geometry. We detail the relation between the two schemes both of which are renowned independently in different sciences.
A motivation to bring them together also comes from recent progress in discretization of the AdS/CFT. Popular toy models of the AdS/CFT a la quantum information use MERA Swingle:2009bg and other tensor networks Pastawski:2015qua . In the first place, quantum gravity has a long history of Regge calculus Regge:1961px and dynamical triangulation Ambjorn:1998xu where spacetimes are approximated by networks. For formulating quantum gravity, we need dynamical network whose structure is determined in a self-organized manner. In that view, neural network architecture may provide a novel platform for quantum gravity and the emergent spacetime.
We show that the AdS/CFT correspondence naturally fits the scheme of the deep Boltzmann machines, where the bulk spacetime geometry is reinterpreted as a sparse neural network. We construct explicitly a deep Boltzmann machine architecture which represents an example of the AdS/CFT correspondence. Previously, in Gan:2017nyt ; Howard a possibility of relating hidden variables of Boltzmann machines to bulk fields was mentioned 111See also an essay Lee:2017skk .. In You:2017guh , entanglement feature of a free fermion chain was trained at a random tensor network as a deep Boltzmann machine. The holographic interpretation for feed-foward deep neural networks was proposed and studied in Hashimoto:2018ftp ; Hashimoto:2018bnb for training QFT and QCD linear response functions, and the obtained emergent bulk spacetime for large QCD exhibits interesting physical properties and computes other observables as predictions. While our work here naturally relates to these work, in this paper we concentrate on deep Boltzmann machines as an AdS/CFT correspondence.
Although at first sight the two schemes look similar, in details they possess different characteristics. For example, since deep Boltzmann machines have constraints for their architecture and trainability, we need a careful discretization of bulk field theories. In addition, the AdS/CFT correspondence is well-understood at the large limit, while that limit has not been studied in the Boltzmann machines. Furthermore, generalization in deep learning owes to degenerate sets of trained weights, while in the AdS/CFT that degeneration is not expected. In this paper we address these basic questions raised in relating the two schemes. We provide a concrete expression of a deep Boltzmann machine which satisfies the standard constraints, and find that the large limit brings the Boltzmann machine to a folded feed-forward architecture. We propose an Einstein action as a regularization of training to distinguish sets of weights to be interpreted as a smooth spacetime.
The organization of this paper is as follows. In Sec. II, we briefly review deep Boltzmann machines. In Sec. III, we provide a dictionary of the deep Boltzmann machines and the AdS/CFT, and construct a deep Boltzmann machine for a bulk scalar field theory in generic curved geometry. Discretization of the fields and the spacetime, and also the properties at the deepest layer are studied. In Sec. IV, we apply the standard large limit (saddle point approximation) to the deep Boltzmann machine and see the consistency with the holographic linear response. In Sec. V, we propose how we identify weights interpreted as a spacetime, through a regularization using Einstein action. Sec. VI is devoted to our summary and discussions. In Appendix A, we provide an autoencoder-like neural network architecture for holographic renormalization.
II Brief review of Boltzmann machine
Boltzmann machines in machine learning are a network model for giving a probabilistic distribution of the input variables for . The probability of the Boltzmann machine is defined by
[TABLE]
with an energy function given by
[TABLE]
Here and are real parameters, and are called bias and weight, respectively. The structure of the Boltzmann machine is specified by a network graph (see Fig. 2 Left) in which units each of which takes the value are denoted by circles while the weights are by lines connecting the circles. Obviously, in physics is a Boltzmann distribution of a canonical ensemble of a classical Ising model, after which the name Boltzmann machine was named.
As (2) is quadratic in , even if one varies the biases and the weights, the class of the probability distributions obtained by (2) is quite limited. Making the network deep enlarges the representation power of the architecture. Adding hidden variables , we have
[TABLE]
with the energy function
[TABLE]
The layer consisting of () is called visible (hidden) layer, and here, weights connecting units in the same layer are set to zero, so the network graph is restricted to have only limited connection lines (see Fig. 2 Center). This (4) is called a restricted Boltzmann machine.
Suppose one has measured events and obtained many sets of . From that one can calculate a statistical probability distribution . In machine learning, one trains the machine to mimic . In the function of the Boltzmann machine, the weights and biases are trained parameters. The difference measure between the two probability distributions, called error function, is the relative entropy (alternatively called Kullback-Leibler (KL) divergence in machine learning)
[TABLE]
and one tries to minimize it by changing the parameters. When this divergence is minimized, the machine is well-trained.
The reason why the restriction in the graph of the Boltzmann machines is important is the conditional independence. In (4), when is given, is linear in , so the probability factorizes to a product of each unit , then the training requires a lot less computational resource. Note that without losing this conditional independence we can add a term ; The Kronecker delta means that this is a self-interaction within the same unit. What is not allowed for the conditional independence is the term like or with in the same layer.
Due to the hidden variables, the representation power of the restricted Boltzmann machines is greater. It is proven that with a sufficiently large number of hidden units any probability distribution is well approximated LeRoux , which is called a universal approximation theorem for the restricted Boltzmann machines. Adding more hidden layers can help the representation power 222Indeed, folding the deep Boltzmann machines leads to an RBM., and the following is called a deep Boltzmann machine dbm ,
[TABLE]
where the index labels the hidden layers . The hidden variables in hidden layers, taking binary values, are summed as in (3):
[TABLE]
The visible layer consisting of units whose values are the input may be thought of as , which means . The weights are again restricted as in the restricted Boltzmann machines, see Fig. 2 Right.
Although the number of units in each hidden layer may not be equal to each other, in this paper we consider the case of having the same number of units in each layer, for a structural simplicity.
III AdS/CFT as a Boltzmann machine
III.1 Dictionary
Let us first describe the similarity between the AdS/CFT correspondence and the deep Boltzmann machine, and construct a dictionary between the two schemes. In the AdS/CFT correspondence Maldacena:1997re , the fundamental formula relating the boundary and the bulk is the GKP-W relation Gubser:1998bc ; Witten:1998qj which is
[TABLE]
This expression is for the large limit of the QFT with its generating functional , while for the finite this expression should be replaced by 333We omit various details in this formula, such as the conformal dimension dependence.
[TABLE]
Here is the emergent bulk coordinate, and is the boundary of the asymptotically AdS bulk, where the boundary condition is put for the bulk field .
The deep Boltzmann machine approximates a given provability distribution by the formula (III.1) with the energy function (6). The similarity between the quantum gravity version of the GKP-W relation (9) and the definition equation of the deep Boltzmann machine is obvious. The identification rules are as follows: the source function is the input value of the visible layer, the bulk field is the hidden variables , the emergent bulk coordinate is the label for the hidden layers , the generating function of the QFT is the provability distribution , and the bulk action is the energy function . See Table III.1 for a summary of the correspondence. The path integral of the bulk field is replaced by the summation over the hidden variables , so in general the quantum bulk of the AdS/CFT correspondence is a deep Boltzmann machine.
The resemblance is basically the fact that the deep Boltzmann machine tries to reproduce the probability distribution whose input is the values at the visible layer, and in the AdS/CFT in the same manner, the bulk path-integration tries to reproduce the generating functional of the boundary QFT where the input is the boundary value of the bulk field.
In order to make the probability interpretation of the QFT generating functional, we normalize it as
[TABLE]
Then, using the deep Boltzmann machine representation of the bulk, training of the bulk theory is possible to reduce the error function which is given by the Kullback-Leibler divergence of the QFT partition function and the model probability of the Boltzmann machine,
[TABLE]
As the deep Boltzmann machine allows arbitrary architecture for its neural network, it is naturally expected that the AdS/CFT correspondence may be included as an example of the Boltzmann machine. Below we shall demonstrate that a typical AdS/CFT model allows a deep Boltzmann machine architecture.
III.2 Bulk as a neural network
The simplest bulk action is for a free massive scalar field in an asymptotically AdSd+1 bulk geometry,
[TABLE]
We chose Euclideanized signature so that the AdS/CFT correspondence can fit the scheme of Boltzmann machines. The -th coordinate is the Euclideanized time coordinate. Local interaction terms such as can be treated similarly below, but in this paper we consider only the free case.
We assumed for simplicity that the metric depends only on the bulk emergent direction and is diagonal, and assumed also a homogeneous spacetime about , then . We find
[TABLE]
There exists a relation among them,
[TABLE]
In the standard Poincare coordinate system, the asymptotically AdSd+1 geometry is
[TABLE]
with the AdS radius , so we have the condition
[TABLE]
near the AdS boundary .
Let us discretize the action (12) to make it written like the energy function of the deep Boltzmann machine.444 A continuum limit of the deep layers was studied in a different context deepest . First, the bulk geometry is discretized to a regular lattice whose sites are labeled by ; The label refers to the discretized bulk emergent direction ,
[TABLE]
where is the lattice spacing. In the same manner, we discretize and by the lattice spacing and , giving the label and respectively, as . This simplest regularization scheme replaces the integration over by a sum .
The bulk field at the sites are written as
[TABLE]
Thus the bulk scalar field is the variables in the hidden units. Naturally, we identify the label as the label for the layers of a deep Boltzmann machine. We define our visible layer as the AdS boundary value of the scalar field, i.e. the first component of ,
[TABLE]
The -derivative term in the bulk Lagrangian is replaced by
[TABLE]
As for the derivative terms concerning (and similarly for ), we choose
[TABLE]
Note the dependence on the label ; the reason we chose this discretization will be clear below.
The background metric functions are discretized in the same manner,
[TABLE]
Then the bulk action is written as
[TABLE]
This is recast to the following Boltzmann machine form:
[TABLE]
where the weights are given as
[TABLE]
These weights are symmetric.
The path integral over the bulk field is equivalent to the integration over all the hidden variables , therefore the GKP-W relation (9) is written as
[TABLE]
where is defined by (27). And through (22) we have
[TABLE]
This is a deep Boltzmann machine representation of the AdS/CFT correspondence. See Fig. 3 for our architecture.
The background metric appears as the weights of the Boltzmann machine. As is understood from (28) and (29), the weights are not all independent. They form quite a sparse neural network. The trained variables are , , and , under the constraint (17). The bulk scalar field appears as the hidden variables to be summed, at which the boundary value of the bulk scalar field is identified with the visible units.
Note that, because we chose the discretization scheme (24), the weights connecting the units in the same layer are completely diagonal (of the form ) as seen in (29). As explained earlier, this does not violate the conditional independence of the units in the same layer, which is important for the training of the Boltzmann machine.
III.3 Discretized values of the bulk field
Standard Boltzmann machines allow binary values for the variables , while the AdS/CFT correspondence requires continuous values for the bulk field . To bridge these two, we need to discretize also the field value space. Suppose that typical values necessary for training the Boltzmann machines are in the range . Then a natural discretization of the values is given as
[TABLE]
where . In this discretization, we have different values for to take. To bring them to a set of binary-valued variables, we introduce the binary variable as
[TABLE]
Here we divided the single entry to entries with . In effect, each unit referring to in the Boltzmann machine is split into different units . All of those split units need to share the same weight for every original connection with different unit . In this manner, binary-valued Boltzmann machines can be constructed from the continuous-valued Boltzmann machines.
III.4 The deepest layer is the end of space
In the AdS/CFT correspondence, the IR end of the geometry is important, as it directly reflects the properties of allowed spectra of the QFT. Popular holographic geometries are confining geometries and black holes, and they have specific boundary conditions at the IR end of the geometry. Except for the cases of conformal field theories as the boundary QFT, the bulk geometry naturally terminates at some IR scale . In the terminology of the deep Boltzmann machines, this means that the layers terminate at with . Let us rephrase those geometric boundary conditions to the treatment around the deepest layer of the deep Boltzmann machine.
First of all, the layers actually terminates at , and there is no additional layer at . In terms of the weights, this condition means , which is
[TABLE]
The confining geometry refers to the Dirichlet boundary condition for the bulk field , as it simply means that the bulk field needs to vanish in the spacetime in the region specified by . This location is called a “hard wall” in holography. In general, the condition of the hard wall means that the metric function which the scalar field feels has a special behavior there. In fact, to impose we just need that the mass at diverges. So, in this case we can rephrase the Dirichlet boundary condition in terms of the metric function:
[TABLE]
Next, consider the black hole horizon condition instead. At the black hole horizon the component of the metric diverges, while the temporal component vanishes. Thus , and diverges, while and , and remain finite and nonzero. Therefore, the black hole boundary condition is
[TABLE]
with infinitesimally small .
The confining condition (35) and the horizon condition (36) are examples of more general constraints. We can impose other boundary conditions if they are consistent with the large limit of the AdS/CFT, as we shall study in the next section.
For the pure AdS geometry, there is no IR end of the space, and the direction is extended to . So, to host all possible asymptotically AdS spacetimes in our Boltzmann machine architecture, we need to prepare infinitely deep Boltzmann machines.555 However, note that corresponds to the strictly vanishing energy in the QFT, which are not the main constituent of the partition function except for IR regularization ambiguities.
IV Saddle point of Boltzmann machine
The AdS/CFT correspondence has been studied in the large limit of the QFT, because it is the classical limit of the bulk which is the only reliable gravity calculation, in the absence of satisfactory quantum gravity formulation. The large limit, or the classical limit of the gravity theories, is equivalent to the zero temperature limit of the Boltzmann machine, replaced by and . At the limit, gravity theory can be well approximated by saddle points — the solutions of the classical equations of motion, and the on-shell action is simply substituted to the right hand side of (8).
The zero-temperature limit of Boltzmann machines has not been studied extensively, because the hidden/visible variables in ordinary Boltzmann machines take only binary values and the saddle approximation is not effective. In our case, as described in Sec. III.3, we consider a certain limit of binary-valued Boltzmann machines to acquire continuous-valued variables. There the equations of motion, and the saddle points, make sense. In this section, we study consistency conditions of the classical limit (equivalently, the zero temperature limit, or the saddle point approximation) of the deep Boltzmann machine given in the previous section. For simplicity we treat the variables as continuous variables.
First, let us consider the standard restricted Boltzmann machine (4) with continuous-valued variables, and how the classical limit causes an inconsistency. The saddle point equation is
[TABLE]
Since and are the parameters to be fixed after the training with various sets of , this equation cannot be satisfied. Therefore, restricted Boltzmann machines with continuous hidden variables do not allow the saddle point approximation, on the contrary to the physical intuition.
Adding more hidden layers can resolve the issue. Suppose we have another hidden layer to the restricted Boltzmann machine,
[TABLE]
Then the saddle point equation is
[TABLE]
The first equation determines for any given training value of , so it gives a consistent saddle point equation. The second equation simply shows that the middle layer variable takes a fixed value . So, substituting these to the original energy function (38), we obtain the saddle point approximation of the restricted Boltzmann machine,
[TABLE]
Here it should be noted that the obtained energy function is linear in , so it does not have the form of the standard Boltzmann machines whose energy functions are bilinear in . The reason of the linearity is that the saddle point equations for the -odd and the -even layers decouple from each other.
Instead of adding more layers, we can introduce a self-coupling as described earlier in Sec. II. For the case with just a single hidden layer with a uniform self-coupling weight, we have
[TABLE]
The saddle point equation is
[TABLE]
which determines the value of the hidden unit in terms of the input , so it gives a consistent solution. The on-shell value of the energy function is
[TABLE]
where
[TABLE]
Thus, as is expected, the effective energy function is bilinear in .
Keeping these results in mind, we consider the deep Boltzmann machine which we defined in the previous section. The saddle point condition is
[TABLE]
So, the variables at the layer are related to those of the layer and of the layer . The equation has both the properties of the cases of (38) and (44).
Let us study the consistency with the IR boundary condition, the deepest layer. For simplicity, to look at the consistency, we consider the case with a homogeneous in and , which is equivalent to ignore the terms with and those with . The architecture of the deep Boltzmann machine is shown in Fig. 4. At the deepest layer , the saddle point equation gives
[TABLE]
where the symbol denotes a linear relation whose coefficients are given by weights. Similarly, using the saddle point equation at which gives
[TABLE]
where we omit the coefficients. Then altogether, they give . Repeating this backwards in layers, we finally obtain
[TABLE]
which can also be written as
[TABLE]
In the continuum limit, this relation is
[TABLE]
In the boundary QFT of the AdS/CFT correspondence, this relation is equivalent to the linear response relation Klebanov:1999tb ,
[TABLE]
Thus, the deep Boltzmann machine is found to be consistent with the standard analysis in the classical bulk side of the AdS/CFT correspondence.
It is intriguing that the saddle point approximation provides explicitly the relation between the variables at the adjacent layers. This relation is expected for neural networks of the feed-forward type. So, we find that the saddle point approximation of the deep Boltzmann machine provides a feed-forward architecture. A subtle difference from the standard feed-forward is that the linear relation starts at the deepest layer, not at the visible layer. In fact, looking at only the first hidden layer we find that the relation is just like (52), so it is not a linear relation between just the adjacent two layers. In fact, the scalar field equation is the second order differential equation, so, there is a backward wave in addition to the forward wave. These two waves satisfy the consistency condition at the deepest layer. Therefore, the saddle point approximation provides a “folded feed-forward” structure. Unfolding the folded structure is possible, and in Appendix A we provide an architecture of the unfolded type, which looks like an autoencoder.
V Regularization and Einstein action
In this section we study the condition for the trained weights to be interpreted as a bulk spacetime. The training should be performed in the following manner. First, prepare a quantum field theory for which one wants to know whether a gravity dual exists or not. Then calculate and its probability interpretation by (10) 666For this, one may need to use some approximation and some assumption on superselection sectors to perform the path integrations. . Prepare the deep Boltzmann architecture given in Sec. III.2, and by updating the weights to reduce the KL divergence (11). Once the KL divergence decreases to enough accuracy, we say that the bulk is learned.
The metric function is encoded in the sparse weights in the deep Boltzmann machine, given in (28) and (29). Although it can be easily reconstructed, there is one issue: generically, the training ends up with various different sets of weights, because the error function may have many almost degenerate local minima. Each of the local minima can approximate very well — which is related to the notion “generalization” in machine learning.
We are looking for a gravity dual. For the trained Boltzmann machine weights to be interpreted as a bulk spacetime, we need a criterion to pick up a certain set of the weights among the degenerate local minima. The criterion is simple: use an Einstein action for a regularization of the deep Boltzmann machine.777 The word “regularization” here is for Tikhonov regularization, not for the lattice regularization.
Basically, the generic trained weights take quite scattered values, and they are not a smooth function of in the continuum limit . For those configurations of weights, the Einstein action takes a large value. On the other hand, smooth metric functions of , and so a set of the weights whose values do not drastically vary as one sweeps the depth of the layers, have lower values of the Einstein action. Therefore, the Einstein action can be used for selecting a proper set of weights which has a bulk spacetime interpretation.
A proposed regularization term is a discretization of the Einstein action with a negative cosmological term,
[TABLE]
To obtain the explicit discretization, for simplicity we consider a conformal spacetime
[TABLE]
When , the metric reduces to the pure AdS metric. So the asymptotically AdS spacetimes allow only . In terms of the previous , , and , this ansatz leads to
[TABLE]
So, the discretization of the direction as provides a lattice on which is defined for . For as an example, the Einstein action becomes
[TABLE]
Discretizing this action, we obtain the regularization term is the error function
[TABLE]
Here is a positive constant, and we ignored an additive constant term which is irrelevant for the training. The additive constant comes from the first term in (60), that is, the cosmological constant for the pure AdS spacetime.
It is easy to see that this regularization in fact favors a smoother distribution of the weights, due to the second term in (61). Using this regularization, during the training, the Boltzmann machine tries to minimize also the Einstein action at the same time. When the error decreases to a satisfactory small value, the weights can be interpreted as an Einstein spacetime.888Note also that the black hole horizon behavior of the weights, (36), is consistent with this regularization.
When , we have a pure AdS geometry. In generic AdS/CFT correspondence, the bulk action can take various forms; it may have more supergravity fields, and it may suffer from higher derivative terms coming from quantum gravity corrections or stringy corrections. Therefore, in general, the regularization needs to allow more generic actions, such as
[TABLE]
or more with tensorial structures. This can be discretized by the same method, and we obtain a more general Einstein regularization. Note that here in the expression the coefficients are trained variables. In general we do not know the bulk gravity action, so we need to allow general action. When we say that the bulk is a spacetime, it means that it reduces the value of this general action. When the powers of the Riemann tensors stop at some fixed value, a low energy effective spacetime interpretation is possible.
VI Summary and discussions
In this paper, we have shown that the standard AdS/CFT correspondence can be regarded as a deep Boltzmann machine. The neural network architecture, once properly defined, is interpreted as a bulk spacetime geometry. The network depth is the emergent direction in the bulk, and the network weights are metric components. Hidden variables correspond to discretized fields in the bulk, and the probability distribution given by the Boltzmann machine is the generating functional of the QFT dual to the bulk gravity.
For the mapping we used a bulk scalar field theory in curved geometries. The IR boundary conditions of the bulk, such as the black hole horizon or the hard wall, can be implemented to the weight behavior around the deepest layer of the Boltzmann machine. The large limit of the AdS/CFT is argued in the scheme of the Boltzmann machine, consistently giving an organized set of linear equations among weights.
Among many degenerate vacua of the deep Boltzmann machine, a set of weights which allows a spacetime interpretation is selected by a regularization in the error function in addition to the KL divergence. We have introduced a natural regularization based on the Einstein action and its generalization.
Our study provides a relation between the AdS/CFT correspondence and the deep Boltzmann machine. In view of the history of the quantum gravity, introducing discretization of the spacetime is natural, and we hope that more concepts on Boltzmann machines and deep learning can be imported to quantum gravity, so that it may shed light on the mystery of the bulk emergence in the holographic principle.
Several clarification and comments are in order. First, the discretization of the spacetime used in this paper favors a certain coordinate system, and thus the general coordinate transformation of the gravity theory is not seen in our framework. Furthermore, even the isometry transformation, which is the scale transformation in the QFT, is difficult to be implemented in our formulation. A hyperbolic network (as used in You:2017guh ) is better to be consistent with the isometry, but is difficult to find a continuum limit. It would be interesting to seek for a more desirable discretization scheme. In fact, a well-known approach for quantum gravity uses dynamical triangulation Regge:1961px in which connection bond topology (which is axon topology in neural networks) is dynamical. On the other hand standard neural networks have a fixed architecture while the weights are variable. We may need a refined discretization architecture, to have a more unified view of the quantum gravity and the deep learning. Generic quantum gravity may include even non-geometric landscape for which machine learning have been applied He:2017aed ; He:2017dia ; Liu:2017dzi ; Carifio:2017bov ; Ruehle:2017mzq ; Faraggi:2017cnh ; Carifio:2017nyb , and a possible relation to our approach of having neural network as a spacetime would be interesting.
Second, we have introduced the saddle point approximation in the evaluation of the deep Boltzmann machine, based on the standard large argument of the AdS/CFT correspondence. At the limit, linear relations among weights at layers close to each other are derived, and the information at the visible layer is processed through the bulk, as if it propagates. This means that the saddle point of the deep Boltzmann machine brings it to a folded feed-forward type deep neural network. The AdS/CFT interpretation of a feed-forward neural network was studied in Hashimoto:2018ftp ; Hashimoto:2018bnb and the trained weights exhibit an interesting physical picture.
On the other hand, at finite (beyond the classical limit of the bulk), the relation between the AdS/CFT and the deep Boltzmann machine is a little ambiguous — in the bulk, only the scalar field ( hidden variables ) is path-integrated while the metric ( weights ) is not. This situation can be interpreted as that the scalar field is that of a probe brane in the bulk. What is the metric path-integral in the deep Boltzmann machine? It is a statistical summation of the network weights, which has been studied as statistical neural networks Amari1 . It would be interesting to see more connection between the holographic principle and the statistical neural networks. In fact, in Amari2 a conformal transformation of data space was found at a layer-to-layer propagation, and it may allow a holographic interpretation.
Finally, we make a comment on a relation to quantum information. It is known that the AdS/CFT correspondence has a close relation to quantum information, in particular AdS/CFT toy models based on tensor networks have been studied. The structure of MERA Swingle:2009bg has a bulk hyperbolic space interpretation, and tensor networks using perfect tensors Pastawski:2015qua provide a quantum correspondence between the bulk and the boundary in the AdS/CFT. Since it is known Gao that any quantum code allows its deep Boltzmann machine interpretation, the AdS tensor networks can be mapped to deep Boltzmann machines. In general obtained machine architecture tends to be complicated (since the number of quantum gates necessary to reproduce -leg tensor is ), so a continuum limit to have a continuum field theory in the bulk, which we studied in this paper, is difficult to take. Further studies for bridging the holographic principle and deep learning are desired.
Note added: while this manuscript was prepared, we noticed that Ref. Hu:2019nea which interprets the bulk as a deep generative model was submitted to arXiv recently.
Acknowledgements.
The author would like to thank Shun-ichi Amari, Masatoshi Imada, Akinori Tanaka, Akio Tomiya and Yi-Zhuang You for valuable discussions. The work of K.H. was supported in part by MEXT/JSPS KAKENHI Grants No. JP15H03658, No. JP15K13483 and No. JP17H06462.
Appendix A Holographic autoencoder
In this appendix we describe an implementation of the AdS/CFT correspondence into a feed-forward deep neural network of an autoencoder-like architecture. We discuss only the classical limit of the bulk (the large limit) to make sure that the feed-forward structure is clear in the AdS/CFT.
In Hashimoto:2018ftp ; Hashimoto:2018bnb , a deep neural network employs and as the input at the initial layer while the black hole horizon boundary condition is used as an output at the final layer. Another natural implementation is to use (the non-normalizable mode of the AdS scalar field) as an input and (normalizable mode) as the output data. In this case the data propagates from the AdS boundary toward the black hole horizon first, then it bounces with the boundary condition, then propagates back to the AdS boundary again. Therefore the neural network is an hourglass-type, and is naturally interpreted as an autoencoder in machine learning, see Fig. 5. The neural network looks similar to the two-sided black hole geometry which is often used in finite temperature holography Maldacena:2001kr ; Hartman:2013qma ; Matsueda:2012xm ; Mollabashi:2013lya .
The important point for the construction is to use the holographic renormalization group deHaro:2000vlm to divide the second-order differential equation in to a set of two first-order equations (for the non-normalizable and normalizable modes). Generic autoencoders have two important features: their weights are left-right symmetric, and they reduce dimensions of the data space at the neck of the network. In our holographic autoencoder, the left and right are governed by the same metric, and the convolution near the neck is red-shifted so that the weights effectively reduce at the neck.
Here we present details of a construction of the holographic autoencoder. First, we explain how the second-order differential equation of the scalar field in the bulk can be equivalently replaced by a set of two first-order differential equations. This is important for the implementation of the scalar system in the form of a deep neural network, because typically neural networks have the structure of the inter-layer propagation which is naturally interpreted as a first order differential equation when the continuum limit of the layers is taken999Highway networks highway may be another way..
The decoupling among the non-normalizable and the normalizable modes in the AdS/CFT correspondence with the bulk scalar field works only for the free case, when the interaction vanishes, . This is possible at the linear response level of the AdS/CFT correspondence. In this appendix we use the coordinate system where the the AdS radial coordinate is a proper coordinate,
[TABLE]
and define . Then the free stationary scalar field equation is
[TABLE]
where is a covariant Laplacian. As opposed to the case in Hashimoto:2018ftp ; Hashimoto:2018bnb , here we included the spatial dependence of the external field and the response. According to the holographic renormalization group deHaro:2000vlm the equation (64) can be rewritten as
[TABLE]
with
[TABLE]
These constraint equations are simply
[TABLE]
This generically allows two solutions , and with each of them, the scalar field equation reduces to a first order differential equation in ,
[TABLE]
These two equations with govern the non-normalizable and the normalizable modes, respectively. Therefore, once a spacetime bulk metric is given, we find two functions , and use them to define the neural network by discretizing the direction as . Noting that the discretization of gives
[TABLE]
and the discretized spatial dependence is interpreted as a convolution in the neural network,
[TABLE]
then we find that the neural network is defined
[TABLE]
where the network weights are
[TABLE]
Note that we also discretize the -dimensional space of the boundary QFT, as in (71), where the covariant Laplacian can be identified as a convolution in neural network. Generically, spatial derivatives in field equations are identified as a combination of weights connecting nearby units. The locality of the bulk field theory is a constraint of the weights of the neural networks. In this way, we can always include spatial dependence of the external field and the response, as a convolutional neural network. So, (73) defines a convolutional neural network equivalent to (64), with a trivial activation function.
The left hand side of the neural network is governed by the propagation weight (73) for the non-normalizable mode. The input data is placed at the initial layer . It propagates with toward the black hole horizon , and is transformed to a data . At the black hole horizon , we need to impose the boundary condition . Generically does not satisfy it, so we need to complement it as
[TABLE]
This defines the initial condition for the right hand side of the neural network, , which propagates toward with . Then we identify the output as . We call this whole network, shown in Fig.5, a holographic autoencoder.
In reality for the training, we may focus on slowly varying external field and use the low momentum expansion , then the weights are given as
[TABLE]
We train the coefficient function and with a constraint that both consistently solve (68).
Using the horizon behavior and const., we find that (68) has a universal solution
[TABLE]
This means that effectively the weight near the black hole horizon vanishes (except for the trivial “1+” part), due to the red shift factor via . Therefore, the effective dimensions of the data space around the central part of the holographic autoencoder decrease, which is suitable for the name “autoencoder” usually used in machine learning.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) R. Salakhutdinov and G. Hinton, “Deep boltzmann machines,” Proceedings of the International conference on Artificial intelligence and statistics, Vol.12 , 448 (2009).
- 2(2) G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks.,” Science 313, 504 (2006).
- 3(3) Y. Bengio, Y. Le Cun, “Scaling learning algorithms towards AI,” Large-scale kernel machines 34 (2007).
- 4(4) Y. Le Cun, Y. Bengio, G. Hinton, “Deep learning,” Nature 521, 436 (2015).
- 5(5) D. H. Akley, G. E. Hinton and T. J. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” Cognitive Science 9 , 147-169 (1985).
- 6(6) R. Salakhutdinov and H. Larochelle, “Efficient learning of deep Boltzmann machines,” Proceedings of the thirteenth international conference on artificial intelligence and statistics 2010, 693.
- 7(7) J. M. Maldacena, “The Large N limit of superconformal field theories and supergravity,” Int. J. Theor. Phys. 38 , 1113 (1999) [Adv. Theor. Math. Phys. 2 , 231 (1998)] [hep-th/9711200].
- 8(8) S. S. Gubser, I. R. Klebanov and A. M. Polyakov, “Gauge theory correlators from noncritical string theory,” Phys. Lett. B 428 , 105 (1998) [hep-th/9802109].
