Single-dataset Experts for Multi-dataset Question Answering

Dan Friedman; Ben Dodge; Danqi Chen

arXiv:2109.13880·cs.CL·September 29, 2021

Single-dataset Experts for Multi-dataset Question Answering

Dan Friedman, Ben Dodge, Danqi Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Multi-Adapter Dataset Experts (MADE), a multi-dataset question answering approach using lightweight, dataset-specific adapters that outperform baselines and enhance transferability.

Contribution

The paper proposes a novel multi-dataset QA model with lightweight adapters, improving in-distribution accuracy and transfer performance over traditional multi-dataset training methods.

Findings

01

MADE outperforms baselines in in-distribution accuracy.

02

Parameter-averaging improves zero-shot and few-shot transfer.

03

Lightweight adapters enable effective multi-dataset question answering.

Abstract

Many datasets have been created for training reading comprehension models, and a natural question is whether we can combine them to build models that (1) perform better on all of the training datasets and (2) generalize and transfer better to new datasets. Prior work has addressed this goal by training one network simultaneously on multiple datasets, which works well on average but is prone to over- or under-fitting different sub-distributions and might transfer worse compared to source models with more overlap with the target dataset. Our approach is to model multi-dataset question answering with a collection of single-dataset experts, by training a collection of lightweight, dataset-specific adapter modules (Houlsby et al., 2019) that share an underlying Transformer model. We find that these Multi-Adapter Dataset Experts (MADE) outperform all our baselines in terms of in-distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

princeton-nlp/made
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Dense Connections · Byte Pair Encoding · Label Smoothing