Advancing Harmful Content Detection in Organizational Research: Integrating Large Language Models with Elo Rating System
Mustafa Akben, Aaron Satko

TL;DR
This paper presents an Elo rating-based method that enhances large language models' ability to detect harmful content in organizational research, outperforming traditional techniques in accuracy and scalability.
Contribution
The paper introduces a novel Elo rating system integrated with LLMs to improve harmful content detection in organizational datasets, addressing moderation limitations.
Findings
Outperforms traditional prompting techniques in accuracy and F1 scores
Reduces false positives in harmful content detection
Enhances scalability for large datasets
Abstract
Large language models (LLMs) offer promising opportunities for organizational research. However, their built-in moderation systems can create problems when researchers try to analyze harmful content, often refusing to follow certain instructions or producing overly cautious responses that undermine validity of the results. This is particularly problematic when analyzing organizational conflicts such as microaggressions or hate speech. This paper introduces an Elo rating-based method that significantly improves LLM performance for harmful content analysis In two datasets, one focused on microaggression detection and the other on hate speech, we find that our method outperforms traditional LLM prompting techniques and conventional machine learning models on key measures such as accuracy, precision, and F1 scores. Advantages include better reliability when analyzing harmful content, fewer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Computational and Text Analysis Methods · Topic Modeling
