MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation

Zuher Jahshan; Ben Ben Ishay; Leonid Yavits

arXiv:2603.18676·cs.AI·March 20, 2026

MANAR: Memory-augmented Attention with Navigational Abstract Conceptual Representation

Zuher Jahshan, Ben Ben Ishay, Leonid Yavits

PDF

Open Access 3 Reviews

TL;DR

MANAR introduces a memory-augmented attention mechanism inspired by cognitive theories, enabling efficient, scalable, and expressive contextualization in language, vision, and speech tasks by mimicking global workspace functions.

Contribution

It proposes a novel GWT-inspired attention architecture with a trainable memory and ACR, achieving linear-time complexity and enabling knowledge transfer from pretrained models.

Findings

01

Matches or exceeds baseline performance in language, vision, and speech tasks.

02

Achieves linear-time scaling, reducing quadratic complexity of standard attention.

03

Enables non-convex contextualization, allowing creative representation synthesis.

Abstract

MANAR (Memory-augmented Attention with Navigational Abstract Conceptual Representation), contextualization layer generalizes standard multi-head attention (MHA) by instantiating the principles of Global Workspace Theory (GWT). While MHA enables unconstrained all-to-all communication, it lacks the functional bottleneck and global integration mechanisms hypothesized in cognitive models of consciousness. MANAR addresses this by implementing a central workspace through a trainable memory of abstract concepts and an Abstract Conceptual Representation (ACR). The architecture follows a two-stage logic that maps directly to GWT mechanics: (i) an integration phase, where retrieved memory concepts converge to form a collective "mental image" (the ACR) based on input stimuli; and (ii) a broadcasting phase, where this global state navigates and informs the contextualization of individual local…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 0Confidence 4

Strengths

N/A, see ethics comment & 'Weaknesses' section

Weaknesses

As pointed out at the beginning of the reviewing phase, the margins of the paper unfortunately appear to have been significantly altered, which allows more space than the original template. I have to therefore recommend desk-rejection / rejection due to misuse of format.

Reviewer 02Rating 0Confidence 1

Strengths

-

Weaknesses

-

Reviewer 03Rating 2Confidence 2

Strengths

- The idea is clearly present: a unification of retrieved global context (ACR) with local attention to avoid all-pairs attention. - Efficiency: Substantial wall-clock and HBM savings in microbenchmarks and end-to-end DeiT-S at large resolutions, with improvements growing with sequence length. - MANAR enables quick adoption and a large reduction in trainable parameters/steps while retaining accuracy on vision and speech.

Weaknesses

- **Modest accuracy gains:** On ImageNet-1K, improvements over DeiT-B are small (82.3% vs. 81.8%). For ASR, the paper claims SOTA, but test-clean 2.9 trails data2vec (2.8) and test-other is tied at 6.8. - **Related work gaps:** While Linformer/Performer and long-sequence families (Mamba/RetNet, KV-cache management) are cited, several key lines are missing or under-discussed: sparse attention baselines, Swin/local-window ViTs, Transformer-XL/Compressive Transformer, standard retrieval-augmented m

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAction Observation and Synchronization · Neurobiology of Language and Bilingualism · Embodied and Extended Cognition