Bandits for Online Calibration: An Application to Content Moderation on   Social Media Platforms

Vashist Avadhanula; Omar Abdul Baki; Hamsa Bastani; Osbert Bastani,; Caner Gocmen; Daniel Haimovich; Darren Hwang; Dima Karamshuk; Thomas Leeper,; Jiayuan Ma; Gregory Macnamara; Jake Mullett; Christopher Palow; Sung Park,; Varun S Rajagopal; Kevin Schaeffer; Parikshit Shah; Deeksha Sinha; Nicolas; Stier-Moses; Peng Xu

arXiv:2211.06516·cs.LG·November 15, 2022

Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms

Vashist Avadhanula, Omar Abdul Baki, Hamsa Bastani, Osbert Bastani,, Caner Gocmen, Daniel Haimovich, Darren Hwang, Dima Karamshuk, Thomas Leeper,, Jiayuan Ma, Gregory Macnamara, Jake Mullett, Christopher Palow, Sung Park,, Varun S Rajagopal, Kevin Schaeffer, Parikshit Shah

PDF

Open Access

TL;DR

This paper presents a bandit-based system for calibrating risk models in Meta's content moderation, dynamically adapting to changing violation trends to improve moderation effectiveness.

Contribution

It introduces a contextual bandit approach to calibrate multiple risk models for content moderation, addressing temporal changes and model reliability in a production environment.

Findings

01

Increased moderation effectiveness by 13%

02

Successfully handles changing and new risk models

03

Demonstrates real-world applicability in Meta's platform

Abstract

We describe the current content moderation strategy employed by Meta to remove policy-violating content from its platforms. Meta relies on both handcrafted and learned risk models to flag potentially violating content for human review. Our approach aggregates these risk models into a single ranking score, calibrating them to prioritize more reliable risk models. A key challenge is that violation trends change over time, affecting which risk models are most reliable. Our system additionally handles production challenges such as changing risk models and novel risk models. We use a contextual bandit to update the calibration in response to such trends. Our approach increases Meta's top-line metric for measuring the effectiveness of its content moderation strategy by 13%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Misinformation and Its Impacts