Recasting Self-Attention with Holographic Reduced Representations
Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James, Holt

TL;DR
This paper introduces Hrrformer, a holographic reduced representation-based self-attention model that significantly reduces computational costs and training epochs, enabling efficient processing of extremely long sequences in malware detection and sequence modeling tasks.
Contribution
The paper presents a novel holographic reduced representation approach to self-attention, achieving lower complexity and faster training while maintaining competitive accuracy.
Findings
Hrrformer reduces training epochs by 10x
Achieves near state-of-the-art accuracy on benchmarks
Up to 280x faster training on Long Range Arena
Abstract
In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the memory and compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a ``Hrrformer'' we obtain several benefits including time complexity, space complexity, and convergence in fewer epochs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Advanced Malware Detection Techniques · Advanced Electron Microscopy Techniques and Applications
MethodsAttention Is All You Need · Dropout · Residual Connection · Linear Layer · Layer Normalization · Byte Pair Encoding · Softmax · Label Smoothing · Absolute Position Encodings · Multi-Head Attention
