CQASUMM: Building References for Community Question Answering Summarization Corpora
Tanya Chowdhury, Tanmoy Chakraborty

TL;DR
This paper introduces CQASUMM, a large annotated dataset for community question answering summarization, and proposes OpinioSumm, a new multi-document summarization method tailored for opinion-rich CQA content.
Contribution
The paper creates the first large-scale CQA summarization dataset and develops OpinioSumm, a multi-document summarizer that handles opinionated and diverse CQA data effectively.
Findings
OpinioSumm outperforms baseline methods by 4.6% ROUGE-1 score.
Existing MDS methods struggle with opinionated and diverse CQA content.
The dataset enables new research in community question answering summarization.
Abstract
Community Question Answering forums such as Quora, Stackoverflow are rich knowledge resources, often catering to information on topics overlooked by major search engines. Answers submitted to these forums are often elaborated, contain spam, are marred by slurs and business promotions. It is difficult for a reader to go through numerous such answers to gauge community opinion. As a result summarization becomes a prioritized task for CQA forums. While a number of efforts have been made to summarize factoid CQA, little work exists in summarizing non-factoid CQA. We believe this is due to the lack of a considerably large, annotated dataset for CQA summarization. We create CQASUMM, the first huge annotated CQA summarization dataset by filtering the 4.4 million Yahoo! Answers L6 dataset. We sample threads where the best answer can double up as a reference summary and build hundred word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Expert finding and Q&A systems
