Language Models are General-Purpose Interfaces
Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang,, Shuming Ma, Furu Wei

TL;DR
This paper introduces a versatile framework using language models as universal interfaces to various foundation models, enabling multi-modal processing and combining benefits of causal and non-causal modeling for improved generalization.
Contribution
It proposes a semi-causal language modeling approach that unifies diverse modality encoders with language models, enhancing flexibility and performance across tasks.
Findings
Outperforms specialized models on multiple benchmarks.
Enables in-context learning and instruction following with finetuned encoders.
Shows strong zero-shot and few-shot generalization capabilities.
Abstract
Foundation models have received much attention due to their effectiveness across a broad range of downstream applications. Though there is a big convergence in terms of architecture, most pretrained models are typically still developed for specific tasks or modalities. In this work, we propose to use language models as a general-purpose interface to various foundation models. A collection of pretrained encoders perceive diverse modalities (such as vision, and language), and they dock with a language model that plays the role of a universal task layer. We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders. We subsume the advantages and capabilities from both causal and non-causal modeling, thereby combining the best of two worlds. Specifically, the proposed method not only inherits the capabilities of in-context learning and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
