Challenges for Toxic Comment Classification: An In-Depth Error Analysis
Betty van Aken, Julian Risch, Ralf Krestel, Alexander L\"oser

TL;DR
This paper compares various models for toxic comment classification on large datasets, introduces an ensemble approach that outperforms individual models, and analyzes errors to identify key challenges like context understanding and label inconsistencies.
Contribution
It presents a comprehensive comparison of deep learning and shallow models, proposes an effective ensemble, and provides an in-depth error analysis highlighting future research directions.
Findings
Ensemble outperforms individual models
Identifies missing context as a key challenge
Highlights issues with dataset label inconsistencies
Abstract
Toxic comment classification has become an active research field with many recently proposed approaches. However, while these approaches address some of the task's challenges others still remain unsolved and directions for further research are needed. To this end, we compare different deep learning and shallow approaches on a new, large comment dataset and propose an ensemble that outperforms all individual models. Further, we validate our findings on a second dataset. The results of the ensemble enable us to perform an extensive error analysis, which reveals open challenges for state-of-the-art methods and directions towards pending future research. These challenges include missing paradigmatic context and inconsistent dataset labels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
