MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling
Ning Ding, Fangcheng Liu, Kyungrae Kim, Linji Hao, Kyeng-Hun Lee, Hyeonmok Ko, Yehui Tang

TL;DR
MeKi introduces a memory-based approach to scale large language models by storing knowledge in ROM, enabling efficient on-device deployment without increasing inference latency or computational costs.
Contribution
The paper presents a novel memory-based scaling method for LLMs that decouples model capacity from inference cost, suitable for edge devices.
Findings
MeKi outperforms dense LLM baselines at the same inference speed.
Memory-based knowledge injection improves model performance.
Zero latency overhead during inference with the proposed re-parameterization.
Abstract
Scaling Large Language Models (LLMs) typically relies on increasing the number of parameters or test-time computations to boost performance. However, these strategies are impractical for edge device deployment due to limited RAM and NPU resources. Despite hardware constraints, deploying performant LLM on edge devices such as smartphone remains crucial for user experience. To address this, we propose MeKi (Memory-based Expert Knowledge Injection), a novel system that scales LLM capacity via storage space rather than FLOPs. MeKi equips each Transformer layer with token-level memory experts that injects pre-stored semantic knowledge into the generation process. To bridge the gap between training capacity and inference efficiency, we employ a re-parameterization strategy to fold parameter matrices used during training into a compact static lookup table. By offloading the knowledge to ROM,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Materials Science
