LangVision-LoRA-NAS: Neural Architecture Search for Variable LoRA Rank in Vision Language Models

Krishna Teja Chitty-Venkata; Murali Emani; Venkatram Vishwanath

arXiv:2508.12512·cs.CV·August 19, 2025

LangVision-LoRA-NAS: Neural Architecture Search for Variable LoRA Rank in Vision Language Models

Krishna Teja Chitty-Venkata, Murali Emani, Venkatram Vishwanath

PDF

Open Access

TL;DR

This paper presents LangVision-LoRA-NAS, a framework that uses neural architecture search to optimize the low-rank adaptation in vision-language models, improving performance and efficiency across tasks.

Contribution

It introduces a NAS-based method to dynamically determine the optimal LoRA rank for vision-language models, enhancing flexibility and task-specific adaptation.

Findings

01

Improved model performance on multiple datasets.

02

Reduced fine-tuning costs compared to fixed-rank LoRA.

03

Demonstrated effectiveness of dynamic rank optimization.

Abstract

Vision Language Models (VLMs) integrate visual and text modalities to enable multimodal understanding and generation. These models typically combine a Vision Transformer (ViT) as an image encoder and a Large Language Model (LLM) for text generation. LoRA (Low-Rank Adaptation) is an efficient fine-tuning method to adapt pre-trained models to new tasks by introducing low-rank updates to their weights. While LoRA has emerged as a powerful technique for fine-tuning large models by introducing low-rank updates, current implementations assume a fixed rank, potentially limiting flexibility and efficiency across diverse tasks. This paper introduces \textit{LangVision-LoRA-NAS}, a novel framework that integrates Neural Architecture Search (NAS) with LoRA to optimize VLMs for variable-rank adaptation. Our approach leverages NAS to dynamically search for the optimal LoRA rank configuration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques