The Art of Embedding Fusion: Optimizing Hate Speech Detection
Mohammad Aflah Khan, Neemesh Yadav, Mohit Jain, Sanyam Goyal

TL;DR
This paper investigates various methods for combining embeddings from multiple pre-trained language models to improve hate speech detection, analyzing their effectiveness and computational costs.
Contribution
It provides a comprehensive analysis of embedding fusion techniques for PLMs in hate speech detection, highlighting their marginal benefits and high computational costs.
Findings
Combining embeddings yields slight performance improvements.
The choice of combination method has minimal impact on results.
Embedding fusion incurs high computational costs.
Abstract
Hate speech detection is a challenging natural language processing task that requires capturing linguistic and contextual nuances. Pre-trained language models (PLMs) offer rich semantic representations of text that can improve this task. However there is still limited knowledge about ways to effectively combine representations across PLMs and leverage their complementary strengths. In this work, we shed light on various combination techniques for several PLMs and comprehensively analyze their effectiveness. Our findings show that combining embeddings leads to slight improvements but at a high computational cost and the choice of combination has marginal effect on the final outcome. We also make our codebase public at https://github.com/aflah02/The-Art-of-Embedding-Fusion-Optimizing-Hate-Speech-Detection .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
