GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Byung-Kwan Lee; Ryo Hachiuma; Yong Man Ro; Yu-Chiang Frank Wang; Yueh-Hua Wu

arXiv:2506.15681·cs.CL·March 3, 2026

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Byung-Kwan Lee, Ryo Hachiuma, Yong Man Ro, Yu-Chiang Frank Wang, Yueh-Hua Wu

PDF

Open Access

TL;DR

GenRecal is a versatile distillation framework that aligns diverse vision-language models, enabling smaller models to effectively learn from larger ones and outperform existing systems on various benchmarks.

Contribution

We introduce GenRecal, a novel framework with a Recalibrator that facilitates knowledge transfer across heterogeneous VLM architectures, addressing a key challenge in model distillation.

Findings

01

GenRecal significantly improves baseline VLM performance.

02

It outperforms large-scale open- and closed-source VLMs.

03

Extensive experiments validate its effectiveness across benchmarks.

Abstract

Recent advancements in vision-language models (VLMs) have leveraged large language models (LLMs) to achieve performance on par with closed-source systems like GPT-4V. However, deploying these models in real-world scenarios, particularly on resource-constrained devices, remains challenging due to their substantial computational demands. This has spurred interest in distilling knowledge from large VLMs into smaller, more efficient counterparts. A key challenge arises here from the diversity of VLM architectures, which are built on different LLMs and employ varying token types-differing in vocabulary size, token splits, and token index ordering. To address this challenge of limitation to a specific VLM type, we present Generation after Recalibration (GenRecal), a general-purpose distillation framework for VLMs. GenRecal incorporates a Recalibrator that aligns and adapts feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques