Mixture of Lookup Key-Value Experts
Zongcheng Wang

TL;DR
The paper introduces MoLKV, an improved model over MoLE that uses context-aware key-value experts to enhance performance while maintaining suitability for resource-constrained devices.
Contribution
MoLKV extends MoLE by incorporating context-aware expert selection through key-value interactions, improving model accuracy and effectiveness.
Findings
MoLKV achieves lower validation loss than MoLE in experiments.
Context-aware expert selection improves model performance.
MoLKV maintains low communication overhead for resource-limited devices.
Abstract
Recent research has developed several LLM architectures suitable for inference on end-user devices, such as the Mixture of Lookup Experts (MoLE)~\parencite{jie_mixture_2025}. A key feature of MoLE is that each token id is associated with a dedicated group of experts. For a given input, only the experts corresponding to the input token id will be activated. Since the communication overhead of loading this small number of activated experts into RAM during inference is negligible, expert parameters can be offloaded to storage, making MoLE suitable for resource-constrained devices. However, MoLE's context-independent expert selection mechanism, based solely on input ids, may limit model performance. To address this, we propose the \textbf{M}ixture \textbf{o}f \textbf{L}ookup \textbf{K}ey-\textbf{V}alue Experts (\textbf{MoLKV}) model. In MoLKV, each expert is structured as a key-value pair.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Context-Aware Activity Recognition Systems · IoT and Edge/Fog Computing
