Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild
Donghyun Son, Byounggyu Lew, Kwanghee Choi, Yongsu Baek, Seungwoo, Choi, Beomjun Shin, Sungjoo Ha, Buru Chang

TL;DR
This paper introduces a threshold optimization method to improve the reliability of automated content moderation decisions based on multiple subtask prediction scores, reducing costs and adapting to policy changes.
Contribution
It proposes a novel threshold optimization approach for combining subtask scores to make reliable moderation decisions, addressing inefficiencies in current policy-specific models.
Findings
Outperforms existing threshold optimization methods in moderation accuracy
Reduces costs associated with dataset re-labeling and model retraining
Effective across various content moderation scenarios
Abstract
Social media platforms struggle to protect users from harmful content through content moderation. These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily. Since moderation policies vary depending on countries and types of products, it is common to train and deploy the models per policy. However, this approach is highly inefficient, especially when the policies change, requiring dataset re-labeling and model re-training on the shifted data distribution. To alleviate this cost inefficiency, social media platforms often employ third-party content moderation services that provide prediction scores of multiple subtasks, such as predicting the existence of underage personnel, rude gestures, or weapons, instead of directly providing final moderation decisions. However, making a reliable automated moderation decision from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
