Automatically Evaluating Opinion Prevalence in Opinion Summarization
Christopher Malon

TL;DR
This paper introduces an automatic metric to evaluate how well opinion summaries reflect the prevalence of opinions in reviews, revealing current methods lag behind human performance but can be improved with preprocessing techniques.
Contribution
The paper proposes a new opinion prevalence metric based on factual consistency scoring, and demonstrates its effectiveness in evaluating and improving opinion summarization methods.
Findings
Human summaries slightly outperform random review extracts.
Existing unsupervised summarization methods underperform compared to humans.
Preprocessing reviews by simplification can significantly improve summarization quality.
Abstract
When faced with a large number of product reviews, it is not clear that a human can remember all of them and weight opinions representatively to write a good reference summary. We propose an automatic metric to test the prevalence of the opinions that a summary expresses, based on counting the number of reviews that are consistent with each statement in the summary, while discrediting trivial or redundant statements. To formulate this opinion prevalence metric, we consider several existing methods to score the factual consistency of a summary statement with respect to each individual source review. On a corpus of Amazon product reviews, we gather multiple human judgments of the opinion consistency, to determine which automatic metric best expresses consistency in product reviews. Using the resulting opinion prevalence metric, we show that a human authored summary has only slightly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining
