Efficient Hyper-Parameter Search for LoRA via Language-aided Bayesian Optimization
Baek Seong-Eun, Lee Jung-Mok, Kim Sung-Bin, Tae-Hyun Oh

TL;DR
This paper presents a language-aided Bayesian Optimization framework that efficiently searches for optimal LoRA hyperparameters in large language models by integrating domain knowledge through natural language prompts, significantly reducing search time.
Contribution
The authors introduce a novel method that leverages language prompts to incorporate domain knowledge into Bayesian Optimization for hyperparameter tuning of LoRA, improving efficiency and performance.
Findings
Achieves over 20% performance improvement with only 30 iterations.
Reduces hyperparameter search from 45,000 to 30 iterations.
Demonstrates effective integration of domain knowledge via language prompts.
Abstract
Fine-tuning Large Language Models (LLMs) with Low-Rank Adaptation (LoRA) enables resource-efficient personalization or specialization, but it comes at the expense of additional hyperparameter tuning. Although LoRA makes fine-tuning efficient, it is highly sensitive to the choice of hyperparameters, and exhaustive hyperparameter search is still computationally very demanding. To address these challenges, we propose a framework that integrates the domain knowledge of pre-trained LLMs into Bayesian Optimization (BO) to efficiently search for LoRA hyperparameters. To leverage the informed knowledge of LLMs, we repurpose LLMs as a discrete-to-continuous mapping to link the hyperparameters and their domain knowledge with a continuous vector space, where BO is conducted. We design and control the mapping by language prompting, where we provide a domain-aware textual prompt describing the…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
* The proposed framework leverages prior knowledge about LoRA via straightforward domain-aware prompting and effectively finds better hyperparameter combinations than default settings * The method is computationally efficient, as 30 iterations are typically enough to yield a good hyperparameter configuration. The proxy evaluation further reduces the evaluation cost by roughly 10 times. * Gains in performance are evaluated across multiple LoRA variants and backbones and seem to be consistent.
* The contribution seems to be incremental, and novelty of the proposed method is limited. LLMs have been already used to propose candidate hyperparameter configuration for HPO with BO, and LoRA tuning can be viewed as a downstream application. * One can argue that using LLMs in HPO is non-trivial for LoRA. Still, the discrete/integer nature of the rank, which motivates the proposed method, might not actually be a fundamental obstacle: frameworks like Optuna already support integer search spaces
The paper is for the most part clearly written and well laid out. The preliminary section on Bayesian optimization is clear and appropriately brief for a background section, although it may have been nice to dwell on the acquisition function slightly more (explaining both its form and what maximizing it corresponds to) before moving on to the proposed framework. The method is sound, and combines established techniques from prior work in Bayesian optimization and LLM-based deep kernel learning.
The work unfortunately has several weaknesses: spanning novelty, impact and timeliness. On the novelty of the proposed method, it it not made very clear where the contributions of prior work such as Ranković & Schwaller 2025 end, and where this paper's contributions start. It appears that the prompting template, learned token, and projection are the new contributions from the authors. The remainder of the method is composed from classical results and methods from Bayesian optimization and recen
1. The authors propose a novel framework that integrates Bayesian Optimization (BO) with Large Language Models (LLMs). The core innovation is using an LLM to convert discrete hyperparameter configurations into continuous, knowledge-rich embeddings.
1. The proposed method is only evaluated on domain-specific fine-tuning, which restricts its practical applicability. In particular, the proxy training approach appears difficult to extend to supervised fine-tuning (SFT), as instruction tuning typically requires diverse tasks and heterogeneous samples. 2. The baseline results reported in this work are significantly weaker than those in the original PiSSA paper [1]. For example, on Gemma, PiSSA reports (77.78, 31.33, 54.31, 66.17), whereas this
- Significant Efficiency: The framework introduces a proxy training evaluation strategy, which drastically reduces the computational cost of HPO. By training on a small subset of data that shows a strong correlation with full-dataset performance, the method can reduce the overall time cost, enabling more efficient optimization. - Strong Empirical Performance: The proposed method demonstrates superior effectiveness, finding high-performing hyperparameters within a small budget. This optimized con
- `Limited Innovation`: The paper appears to be a straightforward application of LLM+BO at the LoRA level, differentiated only by learnable tokens. This limits the novelty, and the work lacks a strong motivation for targeting LoRA's specific parameters. - `Limited Comparison`: The experiments utilize a restricted set of LoRA variants, failing to comprehensively demonstrate the proposed method's effectiveness. Analysis using a wider range of LoRA variants [1-3] is warranted. Furthermore, the pap
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Topic Modeling · Multimodal Machine Learning Applications
