Confounds and Overestimations in Fake Review Detection: Experimentally   Controlling for Product-Ownership and Data-Origin

Felix Soldner; Bennett Kleinberg; Shane Johnson

arXiv:2110.15130·cs.CL·January 11, 2023

Confounds and Overestimations in Fake Review Detection: Experimentally Controlling for Product-Ownership and Data-Origin

Felix Soldner, Bennett Kleinberg, Shane Johnson

PDF

Open Access

TL;DR

This study examines how confounds like data origin and product ownership influence fake review detection, revealing that these confounds lead to overestimations of classifier performance and highlighting the importance of controlling for them.

Contribution

It provides an experimental analysis of confounds in fake review detection, demonstrating their impact on classification accuracy and potential overestimations in previous studies.

Findings

01

Review veracity is somewhat detectable (60-70%).

02

Confounds with product ownership or data origin increase classification accuracy.

03

Combined confounds lead to overestimations of true performance.

Abstract

The popularity of online shopping is steadily increasing. At the same time, fake product reviewsare published widely and have the potential to affect consumer purchasing behavior. In response,previous work has developed automated methods for the detection of deceptive product reviews.However, studies vary considerably in terms of classification performance, and many use data thatcontain potential confounds, which makes it difficult to determine their validity. Two possibleconfounds are data-origin (i.e., the dataset is composed of more than one source) and productownership (i.e., reviews written by individuals who own or do not own the reviewed product). Inthe present study, we investigate the effect of both confounds for fake review detection. Using anexperimental design, we manipulate data-origin, product ownership, review polarity, and veracity.Supervised learning analysis suggests…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Hate Speech and Cyberbullying Detection