A Big Data-empowered System for Real-time Detection of Regional Discriminatory Comments on Vietnamese Social Media
An Nghiep Huynh, Thanh Dat Do, Trong Hop Do

TL;DR
This paper presents a real-time system using machine learning and transfer learning to detect regional discriminatory comments on Vietnamese social media, supported by a new dataset and scalable streaming architecture.
Contribution
It introduces the ViRDC dataset and a scalable, real-time detection system built on Apache Spark for regional discrimination comments in Vietnam.
Findings
Developed the ViRDC dataset for regional discrimination in Vietnamese social media.
Built a real-time detection system with streaming capabilities.
Demonstrated system scalability and responsiveness in processing social media data.
Abstract
Regional discrimination is a persistent social issue in Vietnam. While existing research has explored hate speech in the Vietnamese language, the specific issue of regional discrimination remains under-addressed. Previous studies primarily focused on model development without considering practical system implementation. In this work, we propose a task called Detection of Regional Discriminatory Comments on Vietnamese Social Media, leveraging the power of machine learning and transfer learning models. We have built the ViRDC (Vietnamese Regional Discrimination Comments) dataset, which contains comments from social media platforms, providing a valuable resource for further research and development. Our approach integrates streaming capabilities to process real-time data from social media networks, ensuring the system's scalability and responsiveness. We developed the system on the Apache…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining
