Multimodal Guidance Network for Missing-Modality Inference in Content Moderation
Zhuokai Zhao, Harish Palani, Tianyi Liu, Lena Evans, Ruth Toner

TL;DR
This paper introduces a guidance network that enhances single-modality models for content moderation, enabling effective inference without additional computational costs, by leveraging multimodal training for improved violence detection.
Contribution
The proposed guidance network improves single-modality inference models by sharing knowledge during training, avoiding extra inference costs common in existing missing-modality methods.
Findings
Single-modality models outperform traditional models in violence detection.
The framework maintains low inference computational costs.
Knowledge sharing during training enhances model performance.
Abstract
Multimodal deep learning, especially vision-language models, have gained significant traction in recent years, greatly improving performance on many downstream tasks, including content moderation and violence detection. However, standard multimodal approaches often assume consistent modalities between training and inference, limiting applications in many real-world use cases, as some modalities may not be available during inference. While existing research mitigates this problem through reconstructing the missing modalities, they unavoidably increase unnecessary computational cost, which could be just as critical, especially for large, deployed infrastructures in industry. To this end, we propose a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models to be used for inference.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Adversarial Robustness in Machine Learning
Methodsfail
