Adapting Pre-trained Generative Models for Extractive Question Answering
Prabir Mallick, Tapas Nayak, Indrajit Bhattacharya

TL;DR
This paper explores adapting pre-trained generative models like BART and T5 for extractive question answering by generating indexes of answer spans, showing improved performance over traditional discriminative models.
Contribution
It introduces a novel method that leverages generative models to identify answer spans in extractive QA, addressing label sparsity issues and outperforming existing models.
Findings
Superior performance on multiple extractive QA datasets.
Effective handling of multi-span answer questions.
Demonstrated advantages over state-of-the-art discriminative models.
Abstract
Pre-trained Generative models such as BART, T5, etc. have gained prominence as a preferred method for text generation in various natural language processing tasks, including abstractive long-form question answering (QA) and summarization. However, the potential of generative models in extractive QA tasks, where discriminative models are commonly employed, remains largely unexplored. Discriminative models often encounter challenges associated with label sparsity, particularly when only a small portion of the context contains the answer. The challenge is more pronounced for multi-span answers. In this work, we introduce a novel approach that uses the power of pre-trained generative models to address extractive QA tasks by generating indexes corresponding to context tokens or sentences that form part of the answer. Through comprehensive evaluations on multiple extractive QA datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Dense Connections · Adam · Layer Normalization · Attention Dropout · Adafactor · Linear Layer · SentencePiece
