How Modular Should Neural Module Networks Be for Systematic Generalization?
Vanessa D'Amario, Tomotake Sasaki, Xavier Boix

TL;DR
This paper investigates how the level of modularity in Neural Module Networks affects their ability to generalize systematically in visual question answering tasks, revealing that tuning modularity improves performance.
Contribution
It demonstrates that adjusting the degree of modularity in NMNs significantly enhances systematic generalization, leading to improved architectures.
Findings
Higher modularity at the image encoder stage improves generalization.
Tuned NMNs outperform previous models on multiple VQA datasets.
Modularity influences systematic generalization more than other factors.
Abstract
Neural Module Networks (NMNs) aim at Visual Question Answering (VQA) via composition of modules that tackle a sub-task. NMNs are a promising strategy to achieve systematic generalization, i.e., overcoming biasing factors in the training distribution. However, the aspects of NMNs that facilitate systematic generalization are not fully understood. In this paper, we demonstrate that the degree of modularity of the NMN have large influence on systematic generalization. In a series of experiments on three VQA datasets (VQA-MNIST, SQOOP, and CLEVR-CoGenT), our results reveal that tuning the degree of modularity, especially at the image encoder stage, reaches substantially higher systematic generalization. These findings lead to new NMN architectures that outperform previous ones in terms of systematic generalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
