ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory

Yang Zhao; Chengxiao Dai; Yue Xiu; Mengying Kou; Yuliang Zheng; Dusit Niyato

arXiv:2601.21545·cs.AI·January 30, 2026

ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory

Yang Zhao, Chengxiao Dai, Yue Xiu, Mengying Kou, Yuliang Zheng, Dusit Niyato

PDF

Open Access

TL;DR

ShardMemo introduces a masked MoE routing approach for sharded agentic LLM memory, improving retrieval efficiency and accuracy in multi-agent and long-horizon tasks through structured eligibility constraints and cost-aware gating.

Contribution

It proposes a novel tiered memory system with masked MoE routing for efficient shard selection, outperforming baseline methods in various benchmarks.

Findings

01

Improves F1 scores on LoCoMo by +5.11 to +6.82

02

Reduces retrieval work by 20.5% and latency by 20 ms

03

Achieves high precision and step reduction on ToolBench

Abstract

Agentic large language model (LLM) systems rely on external memory for long-horizon state and concurrent multi-agent execution, but centralized indexes and heuristic partitions become bottlenecks as memory volume and parallel access grow. We present ShardMemo, a budgeted tiered memory service with Tier A per-agent working state, Tier B sharded evidence with shard-local approximate nearest neighbor (ANN) indexes, and Tier C, a versioned skill library. Tier B enforces scope-before-routing: structured eligibility constraints mask ineligible shards before routing or ANN search. We cast shard probing as masked mixture-of-experts (MoE) routing over eligible shards, probing up to $B_{probe}$ shards via Top- $B_{probe}$ or adaptive Top- $P$ , and use cost-aware gating over profile/observation/session shard families; the router is trained from evidence-to-shard supervision. On…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques