Cross-Modal Fine-Tuning: Align then Refine

Junhong Shen; Liam Li; Lucio M. Dery; Corey Staten; Mikhail Khodak,; Graham Neubig; Ameet Talwalkar

arXiv:2302.05738·cs.LG·March 21, 2023·1 cites

Cross-Modal Fine-Tuning: Align then Refine

Junhong Shen, Liam Li, Lucio M. Dery, Corey Staten, Mikhail Khodak,, Graham Neubig, Ameet Talwalkar

PDF

Open Access 1 Repo

TL;DR

ORCA is a versatile cross-modal fine-tuning framework that aligns and refines pretrained models to perform well across diverse modalities and datasets, achieving state-of-the-art results.

Contribution

It introduces a novel align-then-refine workflow enabling a single pretrained model to adapt effectively to multiple modalities.

Findings

01

State-of-the-art results on 3 benchmarks with 60+ datasets from 12 modalities

02

Effective data alignment improves performance, especially in data-limited scenarios

03

Outperforms various hand-designed, AutoML, and task-specific methods

Abstract

Fine-tuning large-scale pretrained models has led to tremendous progress in well-studied modalities such as vision and NLP. However, similar gains have not been observed in many other modalities due to a lack of relevant pretrained models. In this work, we propose ORCA, a general cross-modal fine-tuning framework that extends the applicability of a single large-scale pretrained model to diverse modalities. ORCA adapts to a target task via an align-then-refine workflow: given the target input, ORCA first learns an embedding network that aligns the embedded feature distribution with the pretraining modality. The pretrained model is then fine-tuned on the embedded data to exploit the knowledge shared across modalities. Through extensive experiments, we show that ORCA obtains state-of-the-art results on 3 benchmarks containing over 60 datasets from 12 modalities, outperforming a wide range…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sjunhongshen/orca
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning