VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection

Aditya Taparia; Noel Ngu; Mario Leiva; Joshua Shay Kricheli; John Corcoran; Nathaniel D. Bastian; Gerardo Simari; Paulo Shakarian; Ransalu Senanayake

arXiv:2505.12715·cs.CV·May 20, 2025

VLC Fusion: Vision-Language Conditioned Sensor Fusion for Robust Object Detection

Aditya Taparia, Noel Ngu, Mario Leiva, Joshua Shay Kricheli, John Corcoran, Nathaniel D. Bastian, Gerardo Simari, Paulo Shakarian, Ransalu Senanayake

PDF

Open Access

TL;DR

VLC Fusion introduces a novel sensor fusion framework that uses a vision-language model to adaptively weight sensor modalities based on environmental context, significantly improving object detection robustness across diverse conditions.

Contribution

The paper proposes a new fusion method leveraging a vision-language model to dynamically adjust sensor weights according to environmental cues, enhancing detection performance.

Findings

01

Outperforms traditional fusion methods in real-world datasets

02

Improves detection accuracy in varied environmental conditions

03

Effective across multiple sensor modalities

Abstract

Although fusing multiple sensor modalities can enhance object detection performance, existing fusion approaches often overlook subtle variations in environmental conditions and sensor inputs. As a result, they struggle to adaptively weight each modality under such variations. To address this challenge, we introduce Vision-Language Conditioned Fusion (VLC Fusion), a novel fusion framework that leverages a Vision-Language Model (VLM) to condition the fusion process on nuanced environmental cues. By capturing high-level environmental context such as as darkness, rain, and camera blurring, the VLM guides the model to dynamically adjust modality weights based on the current scene. We evaluate VLC Fusion on real-world autonomous driving and military target detection datasets that include image, LIDAR, and mid-wave infrared modalities. Our experiments show that VLC Fusion consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Optical Sensing Technologies · Image Enhancement Techniques