TL;DR
This paper introduces a multi-task learning approach using BERT models to assess the quality and extract answers from question-answer pairs in noisy social media datasets, improving data cleaning and usability.
Contribution
It proposes a novel multi-task BERT-based framework for evaluating question-answer plausibility and extracting answers, enhancing dataset quality from social media sources.
Findings
Question plausibility AUROC=0.75
Response plausibility AUROC=0.78
Answer extraction F1=0.665
Abstract
Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data. In this work, we seek to enable the collection of high-quality question-answer datasets from social media by proposing a novel task for automated quality analysis and data cleaning: question-answer (QA) plausibility. Given a machine or user-generated question and a crowd-sourced response from a social media user, we determine if the question and response are valid; if so, we identify the answer within the free-form response. We design BERT-based models to perform the QA plausibility task, and we evaluate the ability of our models to generate a clean, usable question-answer dataset. Our highest-performing approach consists of a single-task model which determines the plausibility of the question, followed by a multi-task model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
