Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models
Yuchen Guo, Junli Gong, Wenjun Dong, Yiuming Cheung, Weifeng Su

TL;DR
FusionProxy is a real-time, plug-and-play image fusion module that enhances thermal and visual perception, improving robustness and speed for safety-critical applications like autonomous driving.
Contribution
It introduces a diffusion-level quality fusion method that operates independently, enabling real-time thermal-visual fusion without joint optimization.
Findings
Achieves superior static recognition performance.
Enhances robustness in dynamic tasks such as autonomous driving.
Operates in real-time across diverse hardware platforms.
Abstract
Purely RGB-based vision models often fail to provide reliable cues in challenging scenarios such as nighttime and fog, leading to degraded performance and safety risks. Infrared imaging captures heat-emitting sources and provides critical complementary information, but existing high-fidelity fusion methods suffer from prohibitive latency, rendering them impractical for real-time edge deployment. To address this, we propose FusionProxy, a real-time image fusion module designed as a fully independent, plug-and-play component with diffusion level quality. FusionProxy exploits two complementary statistics of a teacher sample ensemble: per-pixel variance in raw image space, used to weight pixel-level supervision, and per-pixel variance inside frozen foundation backbones, used to route feature-level alignment spatially. Once trained, FusionProxy can be directly integrated into any visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
