Recasting Self-Attention with Holographic Reduced Representations

Mohammad Mahmudul Alam; Edward Raff; Stella Biderman; Tim Oates; James; Holt

arXiv:2305.19534·cs.LG·June 1, 2023·1 cites

Recasting Self-Attention with Holographic Reduced Representations

Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James, Holt

PDF

Open Access 1 Repo

TL;DR

This paper introduces Hrrformer, a holographic reduced representation-based self-attention model that significantly reduces computational costs and training epochs, enabling efficient processing of extremely long sequences in malware detection and sequence modeling tasks.

Contribution

The paper presents a novel holographic reduced representation approach to self-attention, achieving lower complexity and faster training while maintaining competitive accuracy.

Findings

01

Hrrformer reduces training epochs by 10x

02

Achieves near state-of-the-art accuracy on benchmarks

03

Up to 280x faster training on Long Range Arena

Abstract

In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the $O (T^{2})$ memory and $O (T^{2} H)$ compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of $T \geq 100, 000$ are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a ``Hrrformer'' we obtain several benefits including $O (T H lo g H)$ time complexity, $O (T H)$ space complexity, and convergence in $10 \times$ fewer epochs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neuromorphiccomputationresearchprogram/hrrformer
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · Advanced Malware Detection Techniques · Advanced Electron Microscopy Techniques and Applications

MethodsAttention Is All You Need · Dropout · Residual Connection · Linear Layer · Layer Normalization · Byte Pair Encoding · Softmax · Label Smoothing · Absolute Position Encodings · Multi-Head Attention