Regularizing Attention Networks for Anomaly Detection in Visual Question Answering
Doyup Lee, Yeongjae Cheon, Wook-Shin Han

TL;DR
This paper introduces an attention-based regularization method to enhance the robustness of VQA models against anomalies, demonstrating improved detection of abnormal inputs and stability in real-world scenarios.
Contribution
It proposes a novel attention-based anomaly detection approach combined with maximum entropy regularization, improving robustness of VQA models across various anomalies.
Findings
Attention-based anomaly detection outperforms previous methods.
Maximum entropy regularization significantly improves detection accuracy.
Model-agnostic approach applicable to various VQA architectures.
Abstract
For stability and reliability of real-world applications, the robustness of DNNs in unimodal tasks has been evaluated. However, few studies consider abnormal situations that a visual question answering (VQA) model might encounter at test time after deployment in the real-world. In this study, we evaluate the robustness of state-of-the-art VQA models to five different anomalies, including worst-case scenarios, the most frequent scenarios, and the current limitation of VQA models. Different from the results in unimodal tasks, the maximum confidence of answers in VQA models cannot detect anomalous inputs, and post-training of the outputs, such as outlier exposure, is ineffective for VQA models. Thus, we propose an attention-based method, which uses confidence of reasoning between input images and questions and shows much more promising results than the previous methods in unimodal tasks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications
MethodsEntropy Regularization
