ViMRHP: A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative Annotation

Truc Mai-Thanh Nguyen; Dat Minh Nguyen; Son T. Luu; Kiet Van Nguyen

arXiv:2505.07416·cs.CL·July 8, 2025

ViMRHP: A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative Annotation

Truc Mai-Thanh Nguyen, Dat Minh Nguyen, Son T. Luu, Kiet Van Nguyen

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces ViMRHP, a large-scale Vietnamese multimodal review helpfulness dataset, created with AI-assisted annotation to reduce costs and time while maintaining quality, and evaluates baseline models on this dataset.

Contribution

The paper presents the first Vietnamese multimodal review helpfulness dataset with AI-assisted annotation, improving efficiency and cost-effectiveness in dataset creation.

Findings

01

AI assistance reduces annotation time by over 70%.

02

AI-generated annotations have limitations in complex tasks.

03

Baseline models perform differently on human-verified versus AI-generated annotations.

Abstract

Multimodal Review Helpfulness Prediction (MRHP) is an essential task in recommender systems, particularly in E-commerce platforms. Determining the helpfulness of user-generated reviews enhances user experience and improves consumer decision-making. However, existing datasets focus predominantly on English and Indonesian, resulting in a lack of linguistic diversity, especially for low-resource languages such as Vietnamese. In this paper, we introduce ViMRHP (Vietnamese Multimodal Review Helpfulness Prediction), a large-scale benchmark dataset for MRHP task in Vietnamese. This dataset covers four domains, including 2K products with 46K reviews. Meanwhile, a large-scale dataset requires considerable time and cost. To optimize the annotation process, we leverage AI to assist annotators in constructing the ViMRHP dataset. With AI assistance, annotation time is reduced (90 to 120 seconds per…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trng28/vimrhp
pytorchOfficial

Datasets

trucmtnguyen/ViMRHP
dataset· 974 dl
974 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus