Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks

Yaxin Luo; Zhiqiang Shen

arXiv:2604.01833·cs.CV·April 6, 2026

Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks

Yaxin Luo, Zhiqiang Shen

PDF

TL;DR

This paper demonstrates that a simple bridge training stage can effectively adapt large language models for vision tasks, challenging the belief that language and vision models are incompatible due to parameter differences.

Contribution

Introducing random label bridge training as a modality adaptation method that aligns language models with vision tasks without manual labeling.

Findings

01

Bridge training effectively aligns LLMs with vision tasks.

02

Partial bridge training retains beneficial properties of certain LLM layers.

03

No manual labeling required for the proposed adaptation method.

Abstract

The ratio of outlier parameters in language pre-training models and vision pre-training models differs significantly, making cross-modality (language and vision) inherently more challenging than cross-domain adaptation. As a result, many prior studies have focused on cross-domain transfer rather than attempting to bridge language and vision modalities, assuming that language pre-trained models are unsuitable for downstream visual tasks due to disparate parameter spaces. Contrary to this assumption, we show that adding a bridge training stage as a modality adaptation learner can effectively align Large Language Model (LLM) parameters with vision tasks. Specifically, we propose a simple yet powerful solution random label bridge training that requires no manual labeling and helps LLM parameters adapt to vision foundation tasks. Moreover, our findings reveal that partial bridge training is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.