Learning to Rank Question Answer Pairs with Bilateral Contrastive Data Augmentation
Yang Deng, Wenxuan Zhang, Wai Lam

TL;DR
This paper introduces Bilateral Generation (BiG), a data augmentation method using pre-trained models to generate pseudo-positive question-answer pairs, enhancing ranking performance with limited labeled data.
Contribution
The paper presents a novel contrastive data augmentation strategy, BiG, that synthesizes pseudo-positive QA pairs to improve ranking models for question-answering tasks.
Findings
Significant performance improvements on three benchmark datasets.
Effective utilization of limited labeled data through augmentation.
Easy applicability across different ranking models.
Abstract
In this work, we propose a novel and easy-to-apply data augmentation strategy, namely Bilateral Generation (BiG), with a contrastive training objective for improving the performance of ranking question answer pairs with existing labeled data. In specific, we synthesize pseudo-positive QA pairs in contrast to the original negative QA pairs with two pre-trained generation models, one for question generation, the other for answer generation, which are fine-tuned on the limited positive QA pairs from the original dataset. With the augmented dataset, we design a contrastive training objective for learning to rank question answer pairs. Experimental results on three benchmark datasets show that our method significantly improves the performance of ranking models by making full use of existing labeled data and can be easily applied to different ranking models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Natural Language Processing Techniques
