# Detection of Slang Words in e-Data using semi-Supervised Learning

**Authors:** Alok Ranjan Pal, Diganta Saha

arXiv: 1702.04241 · 2017-02-15

## TL;DR

This paper presents a semi-supervised learning approach to detect slang and abbreviated forms of words in electronic data, leveraging synset and concept analysis to evaluate the likelihood of words being slang.

## Contribution

It introduces a novel semi-supervised method for detecting slang and abbreviations, including sound-alike and taboo forms, in electronic communication data.

## Key findings

- Effective detection of slang and abbreviations in real-world data.
- Utilizes synset and concept analysis to improve accuracy.
- Addresses the challenge of incomplete slang forms in communication.

## Abstract

The proposed algorithmic approach deals with finding the sense of a word in an electronic data. Now a day,in different communication mediums like internet, mobile services etc. people use few words, which are slang in nature. This approach detects those abusive words using supervised learning procedure. But in the real life scenario, the slang words are not used in complete word forms always. Most of the times, those words are used in different abbreviated forms like sounds alike forms, taboo morphemes etc. This proposed approach can detect those abbreviated forms also using semi supervised learning procedure. Using the synset and concept analysis of the text, the probability of a suspicious word to be a slang word is also evaluated.

---
Source: https://tomesphere.com/paper/1702.04241