Hyperbolic Attention Networks
Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan, Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo,, Adam Santoro, Nando de Freitas

TL;DR
This paper introduces hyperbolic attention networks that incorporate hyperbolic geometry into neural network activations, enhancing their capacity to model complex hierarchical and power-law structured data across various tasks.
Contribution
It extends hyperbolic geometry application from network parameters to activations, enabling better reasoning about embeddings in deep neural networks.
Findings
Improved generalization in neural machine translation.
Enhanced learning on graph data.
Better performance in visual question answering.
Abstract
We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks
