One-step and Two-step Classification for Abusive Language Detection on   Twitter

Ji Ho Park; Pascale Fung

arXiv:1706.01206·cs.CL·June 6, 2017

One-step and Two-step Classification for Abusive Language Detection on Twitter

Ji Ho Park, Pascale Fung

PDF

1 Repo

TL;DR

This paper compares one-step and two-step classification methods for detecting abusive language on Twitter, showing that both approaches achieve high F-measures with different models on a Twitter dataset.

Contribution

It introduces and evaluates a two-step classification approach for abusive language detection and compares it with a one-step multi-class method.

Findings

01

HybridCNN achieves 0.827 F-measure in one-step classification.

02

Logistic regression achieves 0.824 F-measure in two-step classification.

03

Both methods show promising performance on Twitter data.

Abstract

Automatic abusive language detection is a difficult but important task for online social media. Our research explores a two-step approach of performing classification on abusive language and then classifying into specific types and compares it with one-step approach of doing one multi-class classification for detecting sexist and racist languages. With a public English Twitter corpus of 20 thousand tweets in the type of sexism and racism, our approach shows a promising performance of 0.827 F-measure by using HybridCNN in one-step and 0.824 F-measure by using logistic regression in two-steps.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

younggns/comparative-abusive-lang
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLogistic Regression