BM-NAS: Bilevel Multimodal Neural Architecture Search
Yihang Yin, Siyu Huang, Xiang Zhang

TL;DR
BM-NAS introduces a bilevel neural architecture search framework that automatically designs multimodal fusion models, optimizing feature pairing and fusion strategies efficiently for various multimodal learning tasks.
Contribution
It presents a novel bilevel search scheme for fully automating multimodal fusion architecture design, reducing manual effort and search time.
Findings
Achieves competitive performance on three multimodal tasks.
Reduces search time and model parameters compared to existing methods.
Effectively learns fusion strategies including multi-head attention and AoA.
Abstract
Deep neural networks (DNNs) have shown superior performances on various multimodal learning problems. However, it often requires huge efforts to adapt DNNs to individual multimodal tasks by manually engineering unimodal features and designing multimodal feature fusion strategies. This paper proposes Bilevel Multimodal Neural Architecture Search (BM-NAS) framework, which makes the architecture of multimodal fusion models fully searchable via a bilevel searching scheme. At the upper level, BM-NAS selects the inter/intra-modal feature pairs from the pretrained unimodal backbones. At the lower level, BM-NAS learns the fusion strategy for each feature pair, which is a combination of predefined primitive operations. The primitive operations are elaborately designed and they can be flexibly combined to accommodate various effective feature fusion modules such as multi-head attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsSoftmax · Linear Layer
