AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization
Alexander R. Fabbri, Xiaojian Wu, Srini Iyer, Haoran Li, Mona Diab

TL;DR
This paper introduces AnswerSumm, a curated dataset of community question answering threads, along with a comprehensive pipeline for answer summarization that includes data annotation, multi-perspective grouping, and evaluation of models.
Contribution
It provides a novel, professionally curated dataset and a multi-stage pipeline for answer summarization, including new unsupervised data augmentation and reinforcement learning techniques.
Findings
State-of-the-art models benchmarked on the dataset.
Unsupervised data augmentation improves summarization performance.
Reinforcement learning rewards enhance factual consistency and coverage.
Abstract
Community Question Answering (CQA) fora such as Stack Overflow and Yahoo! Answers contain a rich resource of answers to a wide range of community-based questions. Each question thread can receive a large number of answers with different perspectives. One goal of answer summarization is to produce a summary that reflects the range of answer perspectives. A major obstacle for this task is the absence of a dataset to provide supervision for producing such summaries. Recent works propose heuristics to create such data, but these are often noisy and do not cover all answer perspectives present. This work introduces a novel dataset of 4,631 CQA threads for answer summarization curated by professional linguists. Our pipeline gathers annotations for all subtasks of answer summarization, including relevant answer sentence selection, grouping these sentences based on perspectives, summarizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Natural Language Processing Techniques
