Independent Ethical Assessment of Text Classification Models: A Hate Speech Detection Case Study
Amitoj Singh, Jingshu Chen, Lihao Zhang, Amin Rasekh, Ilana Golbin,, Anand Rao

TL;DR
This paper develops a comprehensive independent ethical assessment process for hate speech detection models, integrating qualitative and quantitative measures to evaluate bias, performance, and interpretability.
Contribution
It bridges the gap between high-level ethical frameworks and quantitative metrics, enabling practical independent ethical assessments of text classification models.
Findings
The process effectively identifies biases in hate speech detection models.
It combines protected attribute mining with counterfactual analysis for bias evaluation.
Demonstrated on a deep hate speech detection model, showing practical applicability.
Abstract
An independent ethical assessment of an artificial intelligence system is an impartial examination of the system's development, deployment, and use in alignment with ethical values. System-level qualitative frameworks that describe high-level requirements and component-level quantitative metrics that measure individual ethical dimensions have been developed over the past few years. However, there exists a gap between the two, which hinders the execution of independent ethical assessments in practice. This study bridges this gap and designs a holistic independent ethical assessment process for a text classification model with a special focus on the task of hate speech detection. The assessment is further augmented with protected attributes mining and counterfactual-based analysis to enhance bias assessment. It covers assessments of technical performance, data bias, embedding bias,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
