Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis

Zhu Wang; Sourav Medya; Sathya N. Ravi

arXiv:2302.05608·cs.CV·October 9, 2023

Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis

Zhu Wang, Sourav Medya, Sathya N. Ravi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a differentiable outlier detection layer within an end-to-end vision-language model, leveraging explicit knowledge graphs and an OOD filtering mechanism to improve robustness and efficiency in multimodal tasks.

Contribution

It proposes a novel interactive OOD layer and integrates explicit knowledge graphs into deep models, enabling robust multimodal analysis with fewer samples and less training time.

Findings

01

Achieves competitive results with fewer training samples.

02

Effectively filters noise from external knowledge bases.

03

Enhances robustness in vision-language tasks.

Abstract

Often, deep network models are purely inductive during training and while performing inference on unseen data. Thus, when such models are used for predictions, it is well known that they often fail to capture the semantic information and implicit dependencies that exist among objects (or concepts) on a population level. Moreover, it is still unclear how domain or prior modal knowledge can be specified in a backpropagation friendly manner, especially in large-scale and noisy settings. In this work, we propose an end-to-end vision and language model incorporating explicit knowledge graphs. We also introduce an interactive out-of-distribution (OOD) layer using implicit network operator. The layer is used to filter noise that is brought by external knowledge base. In practice, we apply our model on several vision and language downstream tasks including visual question answering, visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ellenzhuwang/VK_OOD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

Methodsfail · Linear Layer · Contrastive Language-Image Pre-training · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam