Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based   LLMs

Xin Zhou; Ping Nie; Yiwen Guo; Haojie Wei; Zhanqiu Zhang; Pasquale; Minervini; Ruotian Ma; Tao Gui; Qi Zhang; Xuanjing Huang

arXiv:2410.15438·cs.AI·October 22, 2024

Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs

Xin Zhou, Ping Nie, Yiwen Guo, Haojie Wei, Zhanqiu Zhang, Pasquale, Minervini, Ruotian Ma, Tao Gui, Qi Zhang, Xuanjing Huang

PDF

Open Access 1 Video

TL;DR

This paper investigates internal expert mechanisms in MoE-based LLMs to improve Retrieval-Augmented Generation by identifying core experts that influence knowledge utilization and document assessment, leading to enhanced RAG performance.

Contribution

It uncovers core expert groups responsible for RAG behaviors in MoE-LLMs and proposes strategies to leverage expert activations for better retrieval-augmented generation.

Findings

01

Core experts indicate model's knowledge sufficiency

02

Expert activations assess retrieved document quality

03

Strategies improve RAG efficiency and effectiveness

Abstract

Retrieval-Augmented Generation (RAG) significantly improved the ability of Large Language Models (LLMs) to solve knowledge-intensive tasks. While existing research seeks to enhance RAG performance by retrieving higher-quality documents or designing RAG-specific LLMs, the internal mechanisms within LLMs that contribute to the effectiveness of RAG systems remain underexplored. In this paper, we aim to investigate these internal mechanisms within the popular Mixture-of-Expert (MoE)-based LLMs and demonstrate how to improve RAG by examining expert activations in these LLMs. Our controlled experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors. The activation of these core experts can signify the model's inclination towards external/internal knowledge and adjust its behavior. For instance, we identify core experts that can (1) indicate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs· underline

Taxonomy

TopicsRecommender Systems and Techniques · Semantic Web and Ontologies · Data Mining Algorithms and Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Adam · Linear Layer · Dropout · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay · Attention Is All You Need · Dense Connections