Unleashing the Potentials of Likelihood Composition for Multi-modal   Language Models

Shitian Zhao; Renrui Zhang; Xu Luo; Yan Wang; Shanghang Zhang; Peng; Gao

arXiv:2410.00363·cs.CL·October 2, 2024

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models

Shitian Zhao, Renrui Zhang, Xu Luo, Yan Wang, Shanghang Zhang, Peng, Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a likelihood composition framework for fusing heterogeneous multi-modal models post-hoc, demonstrating its effectiveness in visual-question-answering tasks across multiple datasets and model architectures.

Contribution

The paper proposes a novel likelihood composition framework for model fusion, enabling off-the-shelf combination of diverse models in multi-modal tasks.

Findings

01

Likelihood composition improves VQA performance over simple ensemble methods.

02

The framework is effective across 9 VQA datasets and 10 different MLMs.

03

New composition methods can be easily developed within this framework.

Abstract

Model fusing has always been an important topic, especially in an era where large language models (LLM) and multi-modal language models (MLM) with different architectures, parameter sizes and training pipelines, are being created all the time. In this work, we propose a post-hoc framework, aiming at fusing heterogeneous models off-the-shell, which we call \textit{likelihood composition}, and the basic idea is to compose multiple models' likelihood distribution when doing a multi-choice visual-question-answering task. Here the core concept, \textit{likelihood}, is actually the log-probability of the candidate answer. In \textit{likelihood composition}, we introduce some basic operations: \textit{debias}, \textit{highlight}, \textit{majority-vote} and \textit{ensemble}. By combining (composing) these basic elements, we get the mixed composition methods: \textit{mix-composition}. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaoshitian/Likelihood-Composition-Toolkit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems