Confounds and Overestimations in Fake Review Detection: Experimentally Controlling for Product-Ownership and Data-Origin
Felix Soldner, Bennett Kleinberg, Shane Johnson

TL;DR
This study examines how confounds like data origin and product ownership influence fake review detection, revealing that these confounds lead to overestimations of classifier performance and highlighting the importance of controlling for them.
Contribution
It provides an experimental analysis of confounds in fake review detection, demonstrating their impact on classification accuracy and potential overestimations in previous studies.
Findings
Review veracity is somewhat detectable (60-70%).
Confounds with product ownership or data origin increase classification accuracy.
Combined confounds lead to overestimations of true performance.
Abstract
The popularity of online shopping is steadily increasing. At the same time, fake product reviewsare published widely and have the potential to affect consumer purchasing behavior. In response,previous work has developed automated methods for the detection of deceptive product reviews.However, studies vary considerably in terms of classification performance, and many use data thatcontain potential confounds, which makes it difficult to determine their validity. Two possibleconfounds are data-origin (i.e., the dataset is composed of more than one source) and productownership (i.e., reviews written by individuals who own or do not own the reviewed product). Inthe present study, we investigate the effect of both confounds for fake review detection. Using anexperimental design, we manipulate data-origin, product ownership, review polarity, and veracity.Supervised learning analysis suggests…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Hate Speech and Cyberbullying Detection
