Build a Robust QA System with Transformer-based Mixture of Experts

Yu Qing Zhou; Xixuan Julie Liu; Yuanzhe Dong

arXiv:2204.09598·cs.CL·April 21, 2022·1 cites

Build a Robust QA System with Transformer-based Mixture of Experts

Yu Qing Zhou, Xixuan Julie Liu, Yuanzhe Dong

PDF

Open Access 1 Repo

TL;DR

This paper presents a robust question answering system using a transformer-based Mixture of Experts architecture combined with data augmentation techniques, achieving significant out-of-domain performance improvements.

Contribution

It introduces a novel MoE-based QA model integrated into DistilBERT with simplified routing and demonstrates enhanced robustness through data augmentation.

Findings

01

Achieved 53.477 F1 score out-of-domain, a 9.52% improvement over baseline.

02

Demonstrated the effectiveness of MoE architecture in robust QA tasks.

03

Reported 59.506 F1 and 41.651 EM on the final test set.

Abstract

In this paper, we aim to build a robust question answering system that can adapt to out-of-domain datasets. A single network may overfit to the superficial correlation in the training distribution, but with a meaningful number of expert sub-networks, a gating network that selects a sparse combination of experts for each input, and careful balance on the importance of expert sub-networks, the Mixture-of-Experts (MoE) model allows us to train a multi-task learner that can be generalized to out-of-domain datasets. We also explore the possibility of bringing the MoE layers up to the middle of the DistilBERT and replacing the dense feed-forward network with a sparsely-activated switch FFN layers, similar to the Switch Transformer architecture, which simplifies the MoE routing algorithm with reduced communication and computational costs. In addition to model architectures, we explore…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuanzhedong/cs224n_robustqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · Layer Normalization · Dropout · WordPiece · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Label Smoothing