TL;DR
PaceLLM introduces brain-inspired mechanisms to enhance long-context understanding in large language models, significantly improving performance and context length handling.
Contribution
It proposes Persistent Activity and Cortical Expert clustering, novel brain-inspired techniques for better context retention and semantic organization in LLMs.
Findings
Achieves 6% improvement on Multi-document QA
Gains 12.5-17.5% on Infinite-Bench tasks
Extends context length to 200K tokens in NIAH tests
Abstract
While Large Language Models (LLMs) demonstrate strong performance across domains, their long-context capabilities are limited by transient neural activations causing information decay and unstructured feed-forward network (FFN) weights leading to semantic fragmentation. Inspired by the brain's working memory and cortical modularity, we propose PaceLLM, featuring two innovations: (1) a Persistent Activity (PA) Mechanism that mimics prefrontal cortex (PFC) neurons' persistent firing by introducing an activation-level memory bank to dynamically retrieve, reuse, and update critical FFN states, addressing contextual decay; and (2) Cortical Expert (CE) Clustering that emulates task-adaptive neural specialization to reorganize FFN weights into semantic modules, establishing cross-token dependencies and mitigating fragmentation. Extensive evaluations show that PaceLLM achieves 6% improvement on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
