Attention in a family of Boltzmann machines emerging from modern Hopfield networks
Toshihiro Ota, Ryo Karakida

TL;DR
This paper explores the properties and trainability of a new Boltzmann machine variant inspired by modern Hopfield networks, revealing connections to existing models and demonstrating its tractability and ease of training.
Contribution
It introduces the attentional Boltzmann machine (AttnBM), a novel model derived from modern Hopfield networks, and analyzes its properties and relationships to other energy-based models.
Findings
AttnBM has a tractable likelihood and gradient in certain cases.
AttnBM is easy to train compared to traditional BMs.
Connections are established between AttnBM and models like Gaussian--Bernoulli RBMs and denoising autoencoders.
Abstract
Hopfield networks and Boltzmann machines (BMs) are fundamental energy-based neural network models. Recent studies on modern Hopfield networks have broaden the class of energy functions and led to a unified perspective on general Hopfield networks including an attention module. In this letter, we consider the BM counterparts of modern Hopfield networks using the associated energy functions, and study their salient properties from a trainability perspective. In particular, the energy function corresponding to the attention module naturally introduces a novel BM, which we refer to as the attentional BM (AttnBM). We verify that AttnBM has a tractable likelihood function and gradient for certain special cases and is easy to train. Moreover, we reveal the hidden connections between AttnBM and some single-layer models, namely the Gaussian--Bernoulli restricted BM and the denoising autoencoder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Generative Adversarial Networks and Image Synthesis · Image and Signal Denoising Methods
MethodsDenoising Autoencoder · Softmax
