ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance
Chunwei Wang, Guansong Lu, Junwei Yang, Runhui Huang, Jianhua Han, Lu, Hou, Wei Zhang, Hang Xu

TL;DR
ILLUME is a multimodal large language model that efficiently integrates understanding and generation, using a novel training scheme and self-assessment to improve performance with less data.
Contribution
The paper introduces ILLUME, a unified multimodal LLM with a new vision tokenizer, progressive training, and self-enhancement, reducing data needs and improving alignment.
Findings
Achieves competitive performance with only 15M pretraining data.
Outperforms existing unified MLLMs like Janus on multiple benchmarks.
Demonstrates effective self-assessment for better image-text alignment.
Abstract
In this paper, we introduce ILLUME, a unified multimodal large language model (MLLM) that seamlessly integrates multimodal understanding and generation capabilities within a single large language model through a unified next-token prediction formulation. To address the large dataset size typically required for image-text alignment, we propose to enhance data efficiency through the design of a vision tokenizer that incorporates semantic information and a progressive multi-stage training procedure. This approach reduces the dataset size to just 15M for pretraining -- over four times fewer than what is typically needed -- while achieving competitive or even superior performance with existing unified MLLMs, such as Janus. Additionally, to promote synergistic enhancement between understanding and generation capabilities, which is under-explored in previous works, we introduce a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLegal Education and Practice Innovations · Artificial Intelligence in Law · Artificial Intelligence Applications
