Mixture of Experts for Biomedical Question Answering
Damai Dai, Wenbin Jiang, Jiyuan Zhang, Weihua Peng, Yajuan Lyu,, Zhifang Sui, Baobao Chang, Yong Zhu

TL;DR
This paper introduces MoEBQA, a mixture-of-experts model for biomedical question answering that improves accuracy by routing questions to specialized experts, effectively handling diverse question types.
Contribution
The paper proposes a novel MoE-based approach for BQA that decouples question types, enhancing model performance and interpretability over traditional homogeneous models.
Findings
MoEBQA achieves state-of-the-art results on three BQA datasets.
The model effectively groups questions into human-readable clusters.
MoEBQA significantly outperforms baseline models.
Abstract
Biomedical Question Answering (BQA) has attracted increasing attention in recent years due to its promising application prospect. It is a challenging task because the biomedical questions are professional and usually vary widely. Existing question answering methods answer all questions with a homogeneous model, leading to various types of questions competing for the shared parameters, which will confuse the model decision for each single type of questions. In this paper, in order to alleviate the parameter competition problem, we propose a Mixture-of-Expert (MoE) based question answering method called MoEBQA that decouples the computation for different types of questions by sparse routing. To be specific, we split a pretrained Transformer model into bottom and top blocks. The bottom blocks are shared by all the examples, aiming to capture the general features. The top blocks are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Topic Modeling · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Dropout · Adam · Multi-Head Attention · Residual Connection · Label Smoothing · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer
