# HiddenObject: Modality-Agnostic Fusion for Multimodal Hidden Object Detection

**Authors:** Harris Song, Tuan-Anh Vu, Sanjith Menon, Sriram Narasimhan, M. Khalid Jawed

arXiv: 2508.21135 · 2025-09-15

## TL;DR

HiddenObject introduces a modality-agnostic fusion framework that combines RGB, thermal, and depth data to improve detection of hidden or camouflaged objects in challenging environments, outperforming existing methods.

## Contribution

The paper presents a novel Mamba-based fusion approach that effectively integrates multiple modalities for robust hidden object detection, addressing limitations of unimodal and naive fusion strategies.

## Key findings

- Achieves state-of-the-art performance on benchmark datasets.
- Effectively detects camouflaged and occluded objects.
- Highlights the advantages of modality-agnostic fusion architectures.

## Abstract

Detecting hidden or partially concealed objects remains a fundamental challenge in multimodal environments, where factors like occlusion, camouflage, and lighting variations significantly hinder performance. Traditional RGB-based detection methods often fail under such adverse conditions, motivating the need for more robust, modality-agnostic approaches. In this work, we present HiddenObject, a fusion framework that integrates RGB, thermal, and depth data using a Mamba-based fusion mechanism. Our method captures complementary signals across modalities, enabling enhanced detection of obscured or camouflaged targets. Specifically, the proposed approach identifies modality-specific features and fuses them in a unified representation that generalizes well across challenging scenarios. We validate HiddenObject across multiple benchmark datasets, demonstrating state-of-the-art or competitive performance compared to existing methods. These results highlight the efficacy of our fusion design and expose key limitations in current unimodal and na\"ive fusion strategies. More broadly, our findings suggest that Mamba-based fusion architectures can significantly advance the field of multimodal object detection, especially under visually degraded or complex conditions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21135/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21135/full.md

## References

87 references — full list in the complete paper: https://tomesphere.com/paper/2508.21135/full.md

---
Source: https://tomesphere.com/paper/2508.21135