AiGen-FoodReview: A Multimodal Dataset of Machine-Generated Restaurant Reviews and Images on Social Media
Alessandro Gambetti, Qiwei Han

TL;DR
This paper introduces AiGen-FoodReview, a large multimodal dataset of real and machine-generated restaurant reviews and images, and evaluates detection models to distinguish authentic content from AI-generated fake reviews.
Contribution
It provides the first open-source multimodal dataset of authentic and AI-generated restaurant reviews and images, along with detection models and feature analysis for fake review identification.
Findings
Achieved 99.80% accuracy with multimodal detection using FLAVA.
Showed linguistic and visual features are effective in distinguishing fake reviews.
Demonstrated the utility of handcrafted features based on readability and photographic theories.
Abstract
Online reviews in the form of user-generated content (UGC) significantly impact consumer decision-making. However, the pervasive issue of not only human fake content but also machine-generated content challenges UGC's reliability. Recent advances in Large Language Models (LLMs) may pave the way to fabricate indistinguishable fake generated content at a much lower cost. Leveraging OpenAI's GPT-4-Turbo and DALL-E-2 models, we craft AiGen-FoodReview, a multi-modal dataset of 20,144 restaurant review-image pairs divided into authentic and machine-generated. We explore unimodal and multimodal detection models, achieving 99.80% multimodal accuracy with FLAVA. We use attributes from readability and photographic theories to score reviews and images, respectively, demonstrating their utility as hand-crafted features in scalable and interpretable detection models, with comparable performance. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications
MethodsFLAVA
