Character-level HyperNetworks for Hate Speech Detection
Tomer Wullach, Amir Adler, Einat Minkov

TL;DR
This paper introduces character-level HyperNetworks for hate speech detection, which are smaller and competitive with large pretrained models, especially when trained with additional automatically generated data.
Contribution
The paper proposes a novel character-level HyperNetwork architecture for hate speech detection that is smaller yet competitive with large pretrained models, and demonstrates benefits of augmented data.
Findings
HyperNetworks perform comparably or better than large pretrained models.
Training with automatically generated data improves model performance.
HyperNetworks are significantly smaller in size than traditional deep learning classifiers.
Abstract
The massive spread of hate speech, hateful content targeted at specific subpopulations, is a problem of critical social importance. Automated methods of hate speech detection typically employ state-of-the-art deep learning (DL)-based text classifiers-large pretrained neural language models of over 100 million parameters, adapting these models to the task of hate speech detection using relevant labeled datasets. Unfortunately, there are only a few public labeled datasets of limited size that are available for this purpose. We make several contributions with high potential for advancing this state of affairs. We present HyperNetworks for hate speech detection, a special class of DL networks whose weights are regulated by a small-scale auxiliary network. These architectures operate at character-level, as opposed to word or subword-level, and are several orders of magnitude smaller compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsMulti-Head Attention · Linear Layer · Weight Decay · Linear Warmup With Linear Decay · Dropout · LAMB · Adam · Attention Dropout · Residual Connection · MobileBERT
