Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs
Yiheng Yang, Yujie Wang, Chi Ma, Lei Yu, Emmanuele Chersoni, Chu-Ren, Huang

TL;DR
This paper introduces CLADA, a cognitive-load-aware dynamic activation framework for large language models that improves efficiency by adaptively activating parameters based on input complexity, inspired by human brain mechanisms.
Contribution
CLADA is the first method to integrate neurolinguistic insights with LLM sparsity, achieving significant speedup without retraining or architecture changes.
Findings
~20% average speedup with <2% accuracy drop
Outperforms existing sparsity methods like Griffin and TT
Establishes a link between ERP components and LLM efficiency
Abstract
Dense large language models(LLMs) face critical efficiency bottlenecks as they rigidly activate all parameters regardless of input complexity. While existing sparsity methods(static pruning or dynamic activation) address this partially, they either lack adaptivity to contextual or model structural demands or incur prohibitive computational overhead. Inspired by human brain's dual-process mechanisms - predictive coding (N400) for backbone sparsity and structural reanalysis (P600) for complex context - we propose CLADA, a \textit{\textbf{C}ognitive-\textbf{L}oad-\textbf{A}ware \textbf{D}ynamic \textbf{A}ctivation} framework that synergizes statistical sparsity with semantic adaptability. Our key insight is that LLM activations exhibit two complementary patterns: 1) \textit{Global statistical sparsity} driven by sequence-level prefix information, and 2) \textit{Local semantic adaptability}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Neurobiology of Language and Bilingualism · Artificial Intelligence in Healthcare and Education
