NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning
Yi Zhang, Chun-Wun Cheng, Ke Yu, Zhihai He, Carola-Bibiane, Sch\"onlieb, and Angelica I.Aviles-Rivero

TL;DR
This paper introduces NODE-Adapter, a novel approach using Neural Ordinary Differential Equations to enhance vision-language reasoning, addressing resource demands, parameter efficiency, and modality integration for improved downstream task adaptation.
Contribution
The paper proposes NODE-Adapter, which models prototype optimization as a continuous process with Neural ODEs, improving vision-language reasoning and task adaptation.
Findings
Outperforms state-of-the-art methods in few-shot classification
Effective in domain generalization tasks
Enhances visual reasoning in human-object interaction scenarios
Abstract
In this paper, we consider the problem of prototype-based vision-language reasoning problem. We observe that existing methods encounter three major challenges: 1) escalating resource demands and prolonging training times, 2) contending with excessive learnable parameters, and 3) fine-tuning based only on a single modality. These challenges will hinder their capability to adapt Vision-Language Models (VLMs) to downstream tasks. Motivated by this critical observation, we propose a novel method called NODE-Adapter, which utilizes Neural Ordinary Differential Equations for better vision-language reasoning. To fully leverage both visual and textual modalities and estimate class prototypes more effectively and accurately, we divide our method into two stages: cross-modal prototype construction and cross-modal prototype optimization using neural ordinary differential equations. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems · Multimodal Machine Learning Applications · Multi-Criteria Decision Making
