Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification
Christos Constantinou, Georgios Ioannides, Aman Chadha, Aaron Elkins,, Edwin Simpson

TL;DR
This paper introduces attention head masking (AHM), a novel method for out-of-distribution detection in multi-modal document classification, which outperforms existing approaches and significantly reduces false positives.
Contribution
The paper proposes AHM for multi-modal OOD detection in documents and introduces the FinanceDocs dataset to facilitate further research.
Findings
AHM outperforms state-of-the-art OOD detection methods
AHM reduces false positive rate by up to 7.5%
FinanceDocs dataset supports future research in document OOD detection
Abstract
Detecting out-of-distribution (OOD) data is crucial in machine learning applications to mitigate the risk of model overconfidence, thereby enhancing the reliability and safety of deployed systems. The majority of existing OOD detection methods predominantly address uni-modal inputs, such as images or texts. In the context of multi-modal documents, there is a notable lack of extensive research on the performance of these methods, which have primarily been developed with a focus on computer vision tasks. We propose a novel methodology termed as attention head masking (AHM) for multi-modal OOD tasks in document classification systems. Our empirical results demonstrate that the proposed AHM method outperforms all state-of-the-art approaches and significantly decreases the false positive rate (FPR) compared to existing solutions up to 7.5\%. This methodology generalizes well to multi-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Digital Media Forensic Detection · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Byte Pair Encoding · Absolute Position Encodings
