PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection

Hyoseok Park; Yeonsang Park

arXiv:2603.21576·physics.optics·March 26, 2026

PRISM: Breaking the O(n) Memory Wall in Long-Context LLM Inference via O(1) Photonic Block Selection

Hyoseok Park, Yeonsang Park

PDF

Open Access

TL;DR

PRISM introduces a photonic-based method to drastically reduce memory bandwidth bottlenecks in long-context large language model inference, enabling O(1) block selection and significant energy savings.

Contribution

This work is the first to leverage photonic broadcast-and-weight paradigm for coarse block selection in long-context LLM inference, breaking the O(n) memory wall.

Findings

01

Achieves 100% accuracy from 4K to 64K tokens at k=32.

02

Reduces traffic by 16x at 64K context length.

03

Provides a four-order-of-magnitude energy advantage over GPU baselines.

Abstract

Long-context LLM inference is bottlenecked not by compute but by the O(n) memory bandwidth cost of scanning the KV cache at every decode step -- a wall that no amount of arithmetic scaling can break. Recent photonic accelerators have demonstrated impressive throughput for dense attention computation; however, these approaches inherit the same O(n) memory scaling as electronic attention when applied to long contexts. We observe that the real leverage point is the coarse block-selection step: a memory-bound similarity search that determines which KV blocks to fetch. We identify, for the first time, that this task is structurally matched to the photonic broadcast-and-weight paradigm -- the query fans out to all candidates via passive splitting, signatures are quasi-static (matching electro-optic MRR programming), and only rank order matters (relaxing precision to 4-6 bits). Crucially, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Optical Network Technologies · Photonic and Optical Devices