Adapting Pre-trained Generative Models for Extractive Question Answering

Prabir Mallick; Tapas Nayak; Indrajit Bhattacharya

arXiv:2311.02961·cs.CL·November 7, 2023·1 cites

Adapting Pre-trained Generative Models for Extractive Question Answering

Prabir Mallick, Tapas Nayak, Indrajit Bhattacharya

PDF

Open Access

TL;DR

This paper explores adapting pre-trained generative models like BART and T5 for extractive question answering by generating indexes of answer spans, showing improved performance over traditional discriminative models.

Contribution

It introduces a novel method that leverages generative models to identify answer spans in extractive QA, addressing label sparsity issues and outperforming existing models.

Findings

01

Superior performance on multiple extractive QA datasets.

02

Effective handling of multi-span answer questions.

03

Demonstrated advantages over state-of-the-art discriminative models.

Abstract

Pre-trained Generative models such as BART, T5, etc. have gained prominence as a preferred method for text generation in various natural language processing tasks, including abstractive long-form question answering (QA) and summarization. However, the potential of generative models in extractive QA tasks, where discriminative models are commonly employed, remains largely unexplored. Discriminative models often encounter challenges associated with label sparsity, particularly when only a small portion of the context contains the answer. The challenge is more pronounced for multi-span answers. In this work, we introduce a novel approach that uses the power of pre-trained generative models to address extractive QA tasks by generating indexes corresponding to context tokens or sentences that form part of the answer. Through comprehensive evaluations on multiple extractive QA datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Dense Connections · Adam · Layer Normalization · Attention Dropout · Adafactor · Linear Layer · SentencePiece