TL;DR
Lance is a lightweight, unified multimodal model that excels in understanding, generating, and editing images and videos through collaborative multi-task training and a dual-stream architecture.
Contribution
It introduces a practical paradigm for unified multimodal modeling with a dual-stream mixture-of-experts architecture and modality-aware positional encoding, improving performance across tasks.
Findings
Lance outperforms existing open-source models in image and video generation.
It maintains strong multimodal understanding capabilities.
Lance's staged multi-task training enhances semantic and visual performance.
Abstract
We present Lance, a lightweight native unified model supporting multimodal understanding, generation, and editing for both images and videos. Rather than relying on model capacity scaling or text-image-dominant designs, Lance explores a practical paradigm for unified multimodal modeling via collaborative multi-task training. It is grounded in two core principles: unified context modeling and decoupled capability pathways. Specifically, Lance is trained from scratch and employs a dual-stream mixture-of-experts architecture on shared interleaved multimodal sequences, enabling joint context learning while decoupling the pathways for understanding and generation. We further introduce modality-aware rotary positional encoding to mitigate interference among heterogeneous visual tokens and boost cross-task alignment. During training, Lance adopts a staged multi-task training paradigm with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗bytedance-research/Lancemodel· 2.7k dl· ♡ 9742.7k dl♡ 974
- 🤗Abiray/Lance_3B_Video-GGUFmodel· 3.0k dl· ♡ 163.0k dl♡ 16
- 🤗mlx-community/Lance-3B-Video-bf16model· 1.4k dl· ♡ 61.4k dl♡ 6
- 🤗mlx-community/Lance-3B-bf16model· 368 dl· ♡ 3368 dl♡ 3
- 🤗mlx-community/Lance-3B-8bitmodel· 450 dl· ♡ 2450 dl♡ 2
- 🤗harishforaiandml/Lancemodel· 10 dl10 dl
- 🤗Evilcarbon/Lancemodel· 10 dl10 dl
- 🤗JHammerZOfficial/Arturomodel· 15 dl15 dl
- 🤗russc821/Lancemodel· 14 dl14 dl
- 🤗mlx-community/Lance-3B-AWQ-INT4model· 45 dl45 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
