Identifying Hate Speech Peddlers in Online Platforms. A Bayesian Social Learning Approach for Large Language Model Driven Decision-Makers
Adit Jain, Vikram Krishnamurthy

TL;DR
This paper presents a Bayesian social learning framework using large language models to detect hate speech peddlers online, analyzing the trade-offs between privacy, herding, and misclassification in sequential decision-making.
Contribution
It introduces a novel Bayesian social learning approach with LLMs for high-dimensional textual data, including a stopping time formulation for optimal herding and privacy balance.
Findings
Agents herd in finite time, disregarding private observations.
Strong prior leads to misclassification of hate speech peddlers.
Threshold policies can delay herding, improving detection accuracy.
Abstract
This paper studies the problem of autonomous agents performing Bayesian social learning for sequential detection when the observations of the state belong to a high-dimensional space and are expensive to analyze. Specifically, when the observations are textual, the Bayesian agent can use a large language model (LLM) as a map to get a low-dimensional private observation. The agent performs Bayesian learning and takes an action that minimizes the expected cost and is visible to subsequent agents. We prove that a sequence of such Bayesian agents herd in finite time to the public belief and take the same action disregarding the private observations. We propose a stopping time formulation for quickest time herding in social learning and optimally balance privacy and herding. Structural results are shown on the threshold nature of the optimal policy to the stopping time problem. We illustrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
