Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs
Xin Zhou, Ping Nie, Yiwen Guo, Haojie Wei, Zhanqiu Zhang, Pasquale, Minervini, Ruotian Ma, Tao Gui, Qi Zhang, Xuanjing Huang

TL;DR
This paper investigates internal expert mechanisms in MoE-based LLMs to improve Retrieval-Augmented Generation by identifying core experts that influence knowledge utilization and document assessment, leading to enhanced RAG performance.
Contribution
It uncovers core expert groups responsible for RAG behaviors in MoE-LLMs and proposes strategies to leverage expert activations for better retrieval-augmented generation.
Findings
Core experts indicate model's knowledge sufficiency
Expert activations assess retrieved document quality
Strategies improve RAG efficiency and effectiveness
Abstract
Retrieval-Augmented Generation (RAG) significantly improved the ability of Large Language Models (LLMs) to solve knowledge-intensive tasks. While existing research seeks to enhance RAG performance by retrieving higher-quality documents or designing RAG-specific LLMs, the internal mechanisms within LLMs that contribute to the effectiveness of RAG systems remain underexplored. In this paper, we aim to investigate these internal mechanisms within the popular Mixture-of-Expert (MoE)-based LLMs and demonstrate how to improve RAG by examining expert activations in these LLMs. Our controlled experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors. The activation of these core experts can signify the model's inclination towards external/internal knowledge and adjust its behavior. For instance, we identify core experts that can (1) indicate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRecommender Systems and Techniques · Semantic Web and Ontologies · Data Mining Algorithms and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Linear Layer · Dropout · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay · Attention Is All You Need · Dense Connections
