Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked   Features, Delivered Fresh from the Oven

Niclas Hildebrandt; Benedikt Boenninghoff; Dennis Orth and; Christopher Schymura

arXiv:2109.02383·cs.CL·August 20, 2024

Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked Features, Delivered Fresh from the Oven

Niclas Hildebrandt, Benedikt Boenninghoff, Dennis Orth and, Christopher Schymura

PDF

Open Access 1 Repo

TL;DR

This paper describes a feature-engineering approach using semantic and style embeddings combined with numerical features, applied to classify toxic, engaging, and fact-claiming comments in a shared task, achieving competitive F1-scores.

Contribution

It introduces a novel combination of semantic, style, and numerical features with ensemble classifiers for comment classification tasks.

Findings

01

Achieved macro F1-scores of 66.8%, 69.9%, and 72.5% for the three subtasks.

02

Demonstrated effectiveness of combining deep neural embeddings with handcrafted features.

03

Showcased the utility of ensemble voting in multi-label comment classification.

Abstract

This paper presents the contribution of the Data Science Kitchen at GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. The task aims at extending the identification of offensive language, by including additional subtasks that identify comments which should be prioritized for fact-checking by moderators and community managers. Our contribution focuses on a feature-engineering approach with a conventional classification backend. We combine semantic and writing style embeddings derived from pre-trained deep neural networks with additional numerical features, specifically designed for this task. Classifier ensembles are used to derive predictions for each subtask via a majority voting scheme. Our best submission achieved macro-averaged F1-scores of 66.8\%,\,69.9\% and 72.5\% for the identification of toxic, engaging, and fact-claiming comments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

data-science-kitchen/germ-eval-2021
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Software Engineering Research

MethodsLogistic Regression