Sociocultural Considerations in Monitoring Anti-LGBTQ+ Content on Social Media
Sidney G.-J. Wong

TL;DR
This paper explores how sociocultural factors affect hate speech detection systems for anti-LGBTQ+ content on social media, highlighting biases in open-source data and suggesting combined qualitative and empirical approaches.
Contribution
It reveals the influence of sociocultural alignment on hate speech detection accuracy and critiques keyword-based data collection methods for overfitting issues.
Findings
Open-source data sets' sociocultural alignment affects detection outcomes.
Keyword search methods can cause models to overfit on slurs.
Combining empirical and qualitative methods improves detection reliability.
Abstract
The purpose of this paper is to ascertain the influence of sociocultural factors (i.e., social, cultural, and political) in the development of hate speech detection systems. We set out to investigate the suitability of using open-source training data to monitor levels of anti-LGBTQ+ content on social media across different national-varieties of English. Our findings suggests the social and cultural alignment of open-source hate speech data sets influences the predicted outputs. Furthermore, the keyword-search approach of anti-LGBTQ+ slurs in the development of open-source training data encourages detection models to overfit on slurs; therefore, anti-LGBTQ+ content may go undetected. We recommend combining empirical outputs with qualitative insights to ensure these systems are fit for purpose.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Islamic Finance and Communication
MethodsSparse Evolutionary Training
