A Dataset and Benchmark for Consumer Healthcare Question Summarization
Abhishek Basu, Deepak Gupta, Dina Demner-Fushman, Shweta Yadav

TL;DR
This paper introduces CHQ-Sum, a new domain-expert annotated dataset of consumer health questions and summaries, to advance healthcare question summarization research.
Contribution
It provides the first large-scale, domain-expert annotated dataset for consumer healthcare question summarization, enabling better model development.
Findings
State-of-the-art models perform variably on the dataset
The dataset improves understanding of consumer health questions
Benchmark results highlight future research directions
Abstract
The quest for seeking health information has swamped the web with consumers health-related questions. Generally, consumers use overly descriptive and peripheral information to express their medical condition or other healthcare needs, contributing to the challenges of natural language understanding. One way to address this challenge is to summarize the questions and distill the key information of the original question. Recently, large-scale datasets have significantly propelled the development of several summarization tasks, such as multi-document summarization and dialogue summarization. However, a lack of a domain-expert annotated dataset for the consumer healthcare questions summarization task inhibits the development of an efficient summarization system. To address this issue, we introduce a new dataset, CHQ-Sum,m that contains 1507 domain-expert annotated consumer health questions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Advanced Text Analysis Techniques
