Multimodal Guidance Network for Missing-Modality Inference in Content   Moderation

Zhuokai Zhao; Harish Palani; Tianyi Liu; Lena Evans; Ruth Toner

arXiv:2309.03452·cs.CV·August 5, 2024·1 cites

Multimodal Guidance Network for Missing-Modality Inference in Content Moderation

Zhuokai Zhao, Harish Palani, Tianyi Liu, Lena Evans, Ruth Toner

PDF

Open Access 1 Repo

TL;DR

This paper introduces a guidance network that enhances single-modality models for content moderation, enabling effective inference without additional computational costs, by leveraging multimodal training for improved violence detection.

Contribution

The proposed guidance network improves single-modality inference models by sharing knowledge during training, avoiding extra inference costs common in existing missing-modality methods.

Findings

01

Single-modality models outperform traditional models in violence detection.

02

The framework maintains low inference computational costs.

03

Knowledge sharing during training enhances model performance.

Abstract

Multimodal deep learning, especially vision-language models, have gained significant traction in recent years, greatly improving performance on many downstream tasks, including content moderation and violence detection. However, standard multimodal approaches often assume consistent modalities between training and inference, limiting applications in many real-world use cases, as some modalities may not be available during inference. While existing research mitigates this problem through reconstructing the missing modalities, they unavoidably increase unnecessary computational cost, which could be just as critical, especially for large, deployed infrastructures in industry. To this end, we propose a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models to be used for inference.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhuokaizhao/multimodal-guidance-network
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Adversarial Robustness in Machine Learning

Methodsfail