Detection of Slang Words in e-Data using semi-Supervised Learning

Alok Ranjan Pal; Diganta Saha

arXiv:1702.04241·cs.CL·February 15, 2017

Detection of Slang Words in e-Data using semi-Supervised Learning

Alok Ranjan Pal, Diganta Saha

PDF

TL;DR

This paper presents a semi-supervised learning approach to detect slang and abbreviated forms of words in electronic data, leveraging synset and concept analysis to evaluate the likelihood of words being slang.

Contribution

It introduces a novel semi-supervised method for detecting slang and abbreviations, including sound-alike and taboo forms, in electronic communication data.

Findings

01

Effective detection of slang and abbreviations in real-world data.

02

Utilizes synset and concept analysis to improve accuracy.

03

Addresses the challenge of incomplete slang forms in communication.

Abstract

The proposed algorithmic approach deals with finding the sense of a word in an electronic data. Now a day,in different communication mediums like internet, mobile services etc. people use few words, which are slang in nature. This approach detects those abusive words using supervised learning procedure. But in the real life scenario, the slang words are not used in complete word forms always. Most of the times, those words are used in different abbreviated forms like sounds alike forms, taboo morphemes etc. This proposed approach can detect those abbreviated forms also using semi supervised learning procedure. Using the synset and concept analysis of the text, the probability of a suspicious word to be a slang word is also evaluated.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.