An Optimized Machine Learning Classifier for Detecting Fake Reviews Using Extracted Features
Shabbir Anees, Anshuman, Ayush Chaurasia, Prathmesh Bogar

TL;DR
This paper presents a machine learning system that effectively detects AI-generated fake reviews by combining advanced text processing, feature selection with Harris Hawks Optimization, and ensemble classification, achieving high accuracy on a large dataset.
Contribution
It introduces a novel combination of feature extraction, bio-inspired optimization, and ensemble learning for improved detection of computer-generated reviews.
Findings
Achieved 95.40% accuracy in detecting AI-generated reviews.
Reduced feature set by 89.9% using Harris Hawks Optimization.
Demonstrated effectiveness of ensemble learning combined with bio-inspired optimization.
Abstract
It is well known that fraudulent reviews cast doubt on the legitimacy and dependability of online purchases. The most recent development that leads customers towards darkness is the appearance of human reviews in computer-generated (CG) ones. In this work, we present an advanced machine-learning-based system that analyses these reviews produced by AI with remarkable precision. Our method integrates advanced text preprocessing, multi-modal feature extraction, Harris Hawks Optimization (HHO) for feature selection, and a stacking ensemble classifier. We implemented this methodology on a public dataset of 40,432 Original (OR) and Computer-Generated (CG) reviews. From an initial set of 13,539 features, HHO selected the most applicable 1,368 features, achieving an 89.9% dimensionality reduction. Our final stacking model achieved 95.40% accuracy, 92.81% precision, 95.01% recall, and a 93.90%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Hate Speech and Cyberbullying Detection
