Reliable Decision from Multiple Subtasks through Threshold Optimization:   Content Moderation in the Wild

Donghyun Son; Byounggyu Lew; Kwanghee Choi; Yongsu Baek; Seungwoo; Choi; Beomjun Shin; Sungjoo Ha; Buru Chang

arXiv:2208.07522·cs.LG·January 27, 2023·1 cites

Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild

Donghyun Son, Byounggyu Lew, Kwanghee Choi, Yongsu Baek, Seungwoo, Choi, Beomjun Shin, Sungjoo Ha, Buru Chang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a threshold optimization method to improve the reliability of automated content moderation decisions based on multiple subtask prediction scores, reducing costs and adapting to policy changes.

Contribution

It proposes a novel threshold optimization approach for combining subtask scores to make reliable moderation decisions, addressing inefficiencies in current policy-specific models.

Findings

01

Outperforms existing threshold optimization methods in moderation accuracy

02

Reduces costs associated with dataset re-labeling and model retraining

03

Effective across various content moderation scenarios

Abstract

Social media platforms struggle to protect users from harmful content through content moderation. These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily. Since moderation policies vary depending on countries and types of products, it is common to train and deploy the models per policy. However, this approach is highly inefficient, especially when the policies change, requiring dataset re-labeling and model re-training on the shifted data distribution. To alleviate this cost inefficiency, social media platforms often employ third-party content moderation services that provide prediction scores of multiple subtasks, such as predicting the existence of underage personnel, rude gestures, or weapons, instead of directly providing final moderation decisions. However, making a reliable automated moderation decision from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hyperconnect/trusthresh
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection