Attention based Bidirectional GRU hybrid model for inappropriate content   detection in Urdu language

Ezzah Shoukat; Rabia Irfan; Iqra Basharat; Muhammad Ali Tahir; Sameen; Shaukat

arXiv:2501.09722·cs.CL·January 17, 2025

Attention based Bidirectional GRU hybrid model for inappropriate content detection in Urdu language

Ezzah Shoukat, Rabia Irfan, Iqra Basharat, Muhammad Ali Tahir, Sameen, Shaukat

PDF

Open Access

TL;DR

This paper introduces an attention-based Bidirectional GRU hybrid model for detecting inappropriate content in Urdu language, demonstrating improved accuracy over baseline models and analyzing the effects of attention and word embeddings.

Contribution

The study proposes a novel attention-based Bidirectional GRU model tailored for Urdu, highlighting the impact of attention layers and pre-trained embeddings on detection performance.

Findings

01

The proposed BiGRU-A model achieved 84% accuracy without pre-trained embeddings.

02

Attention layers enhance model efficiency in Urdu inappropriate content detection.

03

Pre-trained Urdu word2Vec embeddings did not improve model performance in this context.

Abstract

With the increased use of the internet and social networks for online discussions, the spread of toxic and inappropriate content on social networking sites has also increased. Several studies have been conducted in different languages. However, there is less work done for South Asian languages for inappropriate content identification using deep learning techniques. In Urdu language, the spellings are not unique, and people write different common spellings for the same word, while mixing it other languages, like English in the text makes it more challenging, and limited research work is available to process such language with the finest algorithms. The use of attention layer with a deep learning model can help handling the long-term dependencies and increase its efficiency . To explore the effects of the attention layer, this study proposes attention-based Bidirectional GRU hybrid model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Spam and Phishing Detection · Imbalanced Data Classification Techniques

MethodsSoftmax · Attention Is All You Need · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Gated Recurrent Unit