TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models
Junlong Jia, Ying Hu, Xi Weng, Yiming Shi, Miao Li, Xingjian Zhang,, Baichuan Zhou, Ziyu Liu, Jie Luo, Lei Huang, Ji Wu

TL;DR
TinyLLaVA Factory is an open-source, modular codebase designed to simplify the development, extension, and training of small-scale large multimodal models, making advanced multimodal research more accessible.
Contribution
It introduces a modular, extensible framework for small-scale LMMs with prebuilt training recipes, enhancing usability and reproducibility.
Findings
Empirical validation shows effective training and customization.
Facilitates research with affordable computational resources.
Supports easy extension with new models and features.
Abstract
We present TinyLLaVA Factory, an open-source modular codebase for small-scale large multimodal models (LMMs) with a focus on simplicity of code implementations, extensibility of new features, and reproducibility of training results. Following the design philosophy of the factory pattern in software engineering, TinyLLaVA Factory modularizes the entire system into interchangeable components, with each component integrating a suite of cutting-edge models and methods, meanwhile leaving room for extensions to more features. In addition to allowing users to customize their own LMMs, TinyLLaVA Factory provides popular training recipes to let users pretrain and finetune their models with less coding effort. Empirical experiments validate the effectiveness of our codebase. The goal of TinyLLaVA Factory is to assist researchers and practitioners in exploring the wide landscape of designing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsFocus
