FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation

Eric Tillmann Bill; Enis Simsar; Alessio Tonioni; Thomas Hofmann

arXiv:2605.20316·cs.CV·May 21, 2026

FullFlow: Upgrading Text-to-Image Flow Matching Models for Bidirectional Vision--Language Generation

Eric Tillmann Bill, Enis Simsar, Alessio Tonioni, Thomas Hofmann

PDF

TL;DR

FullFlow is a parameter-efficient method that upgrades pretrained text-to-image diffusion models into bidirectional vision--language generators, enabling diverse tasks without extensive retraining.

Contribution

It introduces a lightweight adaptation approach using LoRA adapters to add bidirectional capabilities to existing text-to-image models without full retraining.

Findings

01

Significantly improves bidirectional generation metrics over previous state-of-the-art.

02

Reduces VRAM usage and increases training throughput substantially.

03

Supports downstream tasks like VQA with partial-text generation.

Abstract

Modern text-to-image diffusion models encode rich visual priors, but expose them only through one-way text-conditioned generation. Existing unified vision--language models derived from them recover bidirectional capability through large-scale joint pretraining or substantial retraining of the text pathway, discarding the strong image prior the text-to-image backbone already encodes. We introduce \emph{FullFlow}, a parameter-efficient recipe that upgrades a pretrained rectified-flow text-to-image model into a bidirectional vision--language generator by training only LoRA adapters and lightweight text heads. FullFlow keeps images in their native continuous flow and adds a discrete insertion process for text. Separate image and text timesteps turn inference into trajectory selection in a two-dimensional generative space, enabling text $\to$ image, image $\to$ text, joint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.