MoAI: Mixture of All Intelligence for Large Language and Vision Models

Byung-Kwan Lee; Beomchan Park; Chae Won Kim; Yong Man Ro

arXiv:2403.07508·cs.CV·July 18, 2024·1 cites

MoAI: Mixture of All Intelligence for Large Language and Vision Models

Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro

PDF

Open Access 1 Repo 1 Models

TL;DR

MoAI introduces a novel approach that integrates external computer vision model outputs with large language models, enhancing real-world scene understanding without increasing model size or requiring extensive new datasets.

Contribution

The paper presents MoAI, a new LLVM framework that leverages auxiliary visual information from external CV models through two modules, improving scene understanding in VL tasks.

Findings

01

MoAI outperforms existing LLVMs in zero-shot VL tasks.

02

MoAI enhances real-world scene understanding without enlarging models.

03

MoAI does not require additional visual instruction tuning datasets.

Abstract

The rise of large language models (LLMs) and instruction tuning has led to the current trend of instruction-tuned large language and vision models (LLVMs). This trend involves either meticulously curating numerous instruction tuning datasets tailored to specific objectives or enlarging LLVMs to manage vast amounts of vision language (VL) data. However, current LLVMs have disregarded the detailed and comprehensive real-world scene understanding available from specialized computer vision (CV) models in visual perception tasks such as segmentation, detection, scene graph generation (SGG), and optical character recognition (OCR). Instead, the existing LLVMs rely mainly on the large capacity and emergent capabilities of their LLM backbones. Therefore, we present a new LLVM, Mixture of All Intelligence (MoAI), which leverages auxiliary visual information obtained from the outputs of external…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ByungKwanLee/MoAI
pytorchOfficial

Models

🤗
BK-Lee/MoAI-7B
model· 17 dl· ♡ 45
17 dl♡ 45

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques