Pitfalls of Conditional Batch Normalization for Contextual Multi-Modal Learning
Ivaxi Sheth, Aamer Abdul Rahman, Mohammad Havaei, Samira Ebrahimi, Kahou

TL;DR
This paper critically examines Conditional Batch Normalization (CBN) in multi-modal learning, revealing that CBN can hinder visual feature learning and may lead to shortcut learning, thus questioning its effectiveness for generalization.
Contribution
The study provides a comprehensive evaluation of CBN's limitations across multiple datasets, highlighting its potential to impair visual feature learning and suggesting alternative approaches for better generalization.
Findings
CBN deteriorates visual feature learning in multi-modal tasks.
CBN networks show limited visual feature extraction on bird and histology datasets.
CBN may promote shortcut learning between auxiliary data and labels.
Abstract
Humans have perfected the art of learning from multiple modalities through sensory organs. Despite their impressive predictive performance on a single modality, neural networks cannot reach human level accuracy with respect to multiple modalities. This is a particularly challenging task due to variations in the structure of respective modalities. Conditional Batch Normalization (CBN) is a popular method that was proposed to learn contextual features to aid deep learning tasks. This technique uses auxiliary data to improve representational power by learning affine transformations for convolutional neural networks. Despite the boost in performance observed by using CBN layers, our work reveals that the visual features learned by introducing auxiliary data via CBN deteriorates. We perform comprehensive experiments to evaluate the brittleness of CBN networks to various datasets, suggesting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Advanced Image and Video Retrieval Techniques
MethodsDense Connections · Feedforward Network · Conditional Batch Normalization · Batch Normalization
