Multi-Sensor Matching with HyperNetworks
Eli Passov, Nathan S. Netanyahu, Yosi Keller

TL;DR
This paper introduces a hypernetwork-based approach for multimodal patch matching that enhances robustness to appearance shifts while maintaining efficiency, achieving state-of-the-art results on VIS-IR benchmarks.
Contribution
It proposes a lightweight descriptor-learning architecture using hypernetworks and conditional normalization for improved multimodal matching.
Findings
Achieves state-of-the-art results on VIS-NIR benchmarks.
Maintains efficiency comparable to descriptor-based methods.
Provides a new large-scale cross-platform VIS-IR dataset for evaluation.
Abstract
Hypernetworks are models that generate or modulate the weights of another network. They provide a flexible mechanism for injecting context and task conditioning and have proven broadly useful across diverse applications without significant increases in model size. We leverage hypernetworks to improve multimodal patch matching by introducing a lightweight descriptor-learning architecture that augments a Siamese CNN with (i) hypernetwork modules that compute adaptive, per-channel scaling and shifting and (ii) conditional instance normalization that provides modality-specific adaptation (e.g., visible vs. infrared, VIS-IR) in shallow layers. This combination preserves the efficiency of descriptor-based methods during inference while increasing robustness to appearance shifts. Trained with a triplet loss and hard-negative mining, our approach achieves state-of-the-art results on VIS-NIR and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
