Adversarial Domain Adaptation for Duplicate Question Detection
Darsh J Shah, Tao Lei, Alessandro Moschitti, Salvatore Romeo, Preslav, Nakov

TL;DR
This paper explores adversarial domain adaptation to improve duplicate question detection in forums lacking annotated data, demonstrating significant performance gains across multiple domain pairs.
Contribution
It introduces an adversarial domain adaptation approach tailored for duplicate question detection and analyzes factors influencing its effectiveness.
Findings
Average 5.6% improvement over baselines
Effectiveness depends on domain similarity and data properties
Provides insights into when adversarial adaptation works best
Abstract
We address the problem of detecting duplicate questions in forums, which is an important step towards automating the process of answering new questions. As finding and annotating such potential duplicates manually is very tedious and costly, automatic methods based on machine learning are a viable alternative. However, many forums do not have annotated data, i.e., questions labeled by experts as duplicates, and thus a promising solution is to use domain adaptation from another forum that has such annotations. Here we focus on adversarial domain adaptation, deriving important findings about when it performs well and what properties of the domains are important in this regard. Our experiments with StackExchange data show an average improvement of 5.6% over the best baseline across multiple pairs of domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Sentiment Analysis and Opinion Mining
