TL;DR
This paper introduces QUARC, a quaternion neural network-based multi-modal fusion architecture for hate speech classification that reduces parameters significantly while maintaining performance, using text and image modalities.
Contribution
The paper proposes a novel quaternion neural network model with fusion components for multi-modal hate speech detection, reducing parameters and computational costs.
Findings
75% reduction in parameters
Comparable performance to traditional models
Efficient storage and training benefits
Abstract
Hate speech, quite common in the age of social media, at times harmless but can also cause mental trauma to someone or even riots in communities. Image of a religious symbol with derogatory comment or video of a man abusing a particular community, all become hate speech with its every modality (such as text, image, and audio) contributing towards it. Models based on a particular modality of hate speech post on social media are not useful, rather, we need models like multi-modal fusion models that consider both image and text while classifying hate speech. Text-image fusion models are heavily parameterized, hence we propose a quaternion neural network-based model having additional fusion components for each pair of modalities. The model is tested on the MMHS150K twitter dataset for hate speech classification. The model shows an almost 75% reduction in parameters and also benefits us in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
