LangVision-LoRA-NAS: Neural Architecture Search for Variable LoRA Rank in Vision Language Models
Krishna Teja Chitty-Venkata, Murali Emani, Venkatram Vishwanath

TL;DR
This paper presents LangVision-LoRA-NAS, a framework that uses neural architecture search to optimize the low-rank adaptation in vision-language models, improving performance and efficiency across tasks.
Contribution
It introduces a NAS-based method to dynamically determine the optimal LoRA rank for vision-language models, enhancing flexibility and task-specific adaptation.
Findings
Improved model performance on multiple datasets.
Reduced fine-tuning costs compared to fixed-rank LoRA.
Demonstrated effectiveness of dynamic rank optimization.
Abstract
Vision Language Models (VLMs) integrate visual and text modalities to enable multimodal understanding and generation. These models typically combine a Vision Transformer (ViT) as an image encoder and a Large Language Model (LLM) for text generation. LoRA (Low-Rank Adaptation) is an efficient fine-tuning method to adapt pre-trained models to new tasks by introducing low-rank updates to their weights. While LoRA has emerged as a powerful technique for fine-tuning large models by introducing low-rank updates, current implementations assume a fixed rank, potentially limiting flexibility and efficiency across diverse tasks. This paper introduces \textit{LangVision-LoRA-NAS}, a novel framework that integrates Neural Architecture Search (NAS) with LoRA to optimize VLMs for variable-rank adaptation. Our approach leverages NAS to dynamically search for the optimal LoRA rank configuration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques
