Chain-of-Skills: A Configurable Model for Open-domain Question Answering
Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, Jianfeng, Gao

TL;DR
This paper introduces a modular, skill-based retrieval model for open-domain question answering that enhances transferability, scalability, and performance through flexible configurations and self-supervised pretraining.
Contribution
It proposes a novel modular retriever with skill reuse, inspired by sparse Transformer, improving zero-shot and fine-tuned ODQA performance across multiple datasets.
Findings
Outperforms recent self-supervised retrievers in zero-shot settings.
Achieves state-of-the-art results on NQ, HotpotQA, and OTT-QA.
Supports flexible skill configurations for different domains.
Abstract
The retrieval model is an indispensable component for real-world knowledge-intensive tasks, e.g., open-domain question answering (ODQA). As separate retrieval skills are annotated for different datasets, recent work focuses on customized methods, limiting the model transferability and scalability. In this work, we propose a modular retriever where individual modules correspond to key skills that can be reused across datasets. Our approach supports flexible skill configurations based on the target domain to boost performance. To mitigate task interference, we design a novel modularization parameterization inspired by sparse Transformer. We demonstrate that our model can benefit from self-supervised pretraining on Wikipedia and fine-tuning using multiple ODQA datasets, both in a multi-task fashion. Our approach outperforms recent self-supervised retrievers in zero-shot evaluations and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Layer Normalization · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Multi-Head Attention · Absolute Position Encodings · Dense Connections · Adam
