Reference-based Weak Supervision for Answer Sentence Selection using Web   Data

Vivek Krishnamurthy; Thuy Vu; Alessandro Moschitti

arXiv:2104.08943·cs.CL·April 20, 2021

Reference-based Weak Supervision for Answer Sentence Selection using Web Data

Vivek Krishnamurthy, Thuy Vu, Alessandro Moschitti

PDF

Open Access

TL;DR

This paper introduces a fully automatic weak supervision method called RWS that leverages web data to improve answer sentence selection models, achieving state-of-the-art results on WikiQA.

Contribution

The paper presents RWS, a novel large-scale data collection pipeline that enhances AS2 models by using reference-based weak supervision from web data.

Findings

01

RWS improves TANDA's performance on WikiQA.

02

Achieved state-of-the-art P@1 and MAP scores.

03

Demonstrated robustness of weak supervision approach.

Abstract

Answer sentence selection (AS2) modeling requires annotated data, i.e., hand-labeled question-answer pairs. We present a strategy to collect weakly supervised answers for a question based on its reference to improve AS2 modeling. Specifically, we introduce Reference-based Weak Supervision (RWS), a fully automatic large-scale data pipeline that harvests high-quality weakly-supervised answers from abundant Web data requiring only a question-reference pair as input. We study the efficacy and robustness of RWS in the setting of TANDA, a recent state-of-the-art fine-tuning approach specialized for AS2. Our experiments indicate that the produced data consistently bolsters TANDA. We achieve the state of the art in terms of P@1, 90.1%, and MAP, 92.9%, on WikiQA.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications