# Out of distribution detection with attention head masking for multimodal document classification

**Authors:** Christos Constantinou, Georgios Ioannides, Aman Chadha, Aaron Elkins, Edwin Simpson

PMC · DOI: 10.1038/s41598-025-32328-9 · Scientific Reports · 2026-01-03

## TL;DR

This paper introduces a new method for detecting out-of-distribution data in multi-modal documents using attention head masking in Transformer models.

## Contribution

AHM improves OOD detection by enhancing embedding quality and reducing false positives in both uni-modal and multi-modal settings.

## Key findings

- AHM reduces the false positive rate by up to 10% compared to existing methods.
- The method generalizes well to multi-modal document data with text and visual information.
- A new dataset called FinanceDocs is introduced for OOD detection research.

## Abstract

Detecting out-of-distribution (OOD) data is critical for ensuring the reliability and safety of deployed machine learning systems by mitigating model overconfidence and misclassification. While existing OOD detection methods primarily focus on uni-modal inputs, such as images or text, their effectiveness in multi-modal settings, particularly documents, remains underexplored. Moreover, most approaches prioritize decision mechanisms over optimizing the underlying dense embedding representations for optimal separation. In this work, we introduce Attention Head Masking (AHM), a novel technique applied to Transformer-based models for both uni-modal and multi-modal OOD detection. Our empirical results demonstrate that AHM enhances embedding quality, significantly improving the separation between in-distribution and OOD data. Notably, our method reduces the false positive rate (FPR) by up to 10%, outperforming state-of-the-art approaches. Furthermore, AHM generalizes effectively to multi-modal document data, where textual and visual information are jointly modeled within a Transformer architecture. To encourage further research in this area, we introduce FinanceDocs, a high-quality, publicly available document AI dataset tailored for OOD detection. Our code and dataset is available at https://github.com/constantinouchristos/OOD-AHM.

## Full-text entities

- **Diseases:** AHM (MESH:D006258), ID (MESH:D020243)
- **Chemicals:** CLS (MESH:D002713), MC (MESH:C061001), AHM (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12820214/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12820214/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12820214/full.md

---
Source: https://tomesphere.com/paper/PMC12820214