TinyLLaVA Factory: A Modularized Codebase for Small-scale Large   Multimodal Models

Junlong Jia; Ying Hu; Xi Weng; Yiming Shi; Miao Li; Xingjian Zhang,; Baichuan Zhou; Ziyu Liu; Jie Luo; Lei Huang; Ji Wu

arXiv:2405.11788·cs.LG·May 21, 2024·1 cites

TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

Junlong Jia, Ying Hu, Xi Weng, Yiming Shi, Miao Li, Xingjian Zhang,, Baichuan Zhou, Ziyu Liu, Jie Luo, Lei Huang, Ji Wu

PDF

Open Access 2 Repos 4 Models 2 Datasets

TL;DR

TinyLLaVA Factory is an open-source, modular codebase designed to simplify the development, extension, and training of small-scale large multimodal models, making advanced multimodal research more accessible.

Contribution

It introduces a modular, extensible framework for small-scale LMMs with prebuilt training recipes, enhancing usability and reproducibility.

Findings

01

Empirical validation shows effective training and customization.

02

Facilitates research with affordable computational resources.

03

Supports easy extension with new models and features.

Abstract

We present TinyLLaVA Factory, an open-source modular codebase for small-scale large multimodal models (LMMs) with a focus on simplicity of code implementations, extensibility of new features, and reproducibility of training results. Following the design philosophy of the factory pattern in software engineering, TinyLLaVA Factory modularizes the entire system into interchangeable components, with each component integrating a suite of cutting-edge models and methods, meanwhile leaving room for extensions to more features. In addition to allowing users to customize their own LMMs, TinyLLaVA Factory provides popular training recipes to let users pretrain and finetune their models with less coding effort. Empirical experiments validate the effectiveness of our codebase. The goal of TinyLLaVA Factory is to assist researchers and practitioners in exploring the wide landscape of designing and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems

MethodsFocus