Breaking the KV Cache Bottleneck: Fan Duality Model Achieves O(1) Decode Memory with Superior Associative Recall

Yasong Fan

arXiv:2604.07716·cs.LG·April 14, 2026

Breaking the KV Cache Bottleneck: Fan Duality Model Achieves O(1) Decode Memory with Superior Associative Recall

Yasong Fan

PDF

1 Repo

TL;DR

FDM introduces a sequence model that achieves constant decode memory and superior associative recall by separating sequence processing into wave and particle components, with a novel training strategy and holographic decoding interpretation.

Contribution

The paper presents FDM, a novel linear sequence architecture with fixed O(1) decode memory and improved training and decoding methods, surpassing traditional transformers in efficiency and recall.

Findings

01

FDM reduces decode memory by 4.9x compared to Transformers at N=8,192 tokens.

02

Joint training of wave and particle components leads to suboptimal convergence, addressed by Freeze-Scan.

03

FDM achieves 0.966 accuracy on MQAR, outperforming Transformer significantly.

Abstract

We present FDM (Fan Duality Model), a linear sequence architecture that resolves the fundamental tension between memory efficiency and associative recall in sequence modeling. FDM separates sequence processing into two components: a wave component (recurrent scan via phase-preserving Givens rotations) that compresses long-range patterns into a fixed-size complex hidden state, and a particle component (local-global cache) that retrieves specific tokens via learned associative addressing with W+K=272 slots independent of sequence length N. This yields strictly O(1) decode memory: 867 MB fixed across all prompt lengths 128-8,192 tokens, versus Transformer's 853-4,247 MB (4.9x reduction at N=8,192). Beyond the architecture, we discover that jointly training the wave and particle components leads to suboptimal convergence. We propose Freeze-Scan, a two-phase training strategy that freezes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YasongFan/FDM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.