FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation

Muqing Liu; Chongjie Si; Yuheng Jia

arXiv:2601.22905·cs.LG·February 2, 2026

FlexLoRA: Entropy-Guided Flexible Low-Rank Adaptation

Muqing Liu, Chongjie Si, Yuheng Jia

PDF

Open Access 3 Reviews

TL;DR

FlexLoRA introduces an entropy-guided, flexible low-rank adaptation method for parameter-efficient fine-tuning of large models, allowing dynamic rank adjustment and improved performance over existing methods.

Contribution

It proposes a novel entropy-based importance measure and a flexible framework for rank pruning and expansion in PEFT, addressing limitations of fixed-rank and heuristic methods.

Findings

01

Outperforms state-of-the-art PEFT methods across benchmarks.

02

Effectively prunes and expands ranks based on spectral energy entropy.

03

Maintains stability with zero-impact initialization for new directions.

Abstract

Large pre-trained models achieve remarkable success across diverse domains, yet fully fine-tuning incurs prohibitive computational and memory costs. Parameter-efficient fine-tuning (PEFT) has thus become a mainstream paradigm. Among them, Low-Rank Adaptation (LoRA) introduces trainable low-rank matrices and shows strong performance, nevertheless, its fixed-rank design limits flexibility. Dynamic rank allocation methods mitigate this issue by pruning redundant directions; however, they often rely on heuristic, element-level metrics that globally sort rank directions without matrix-wise distinction, and they lack mechanisms to expand capacity in layers requiring additional adaptation. To overcome these limitations, we propose FlexLoRA, an entropy-guided flexible low-rank adaptation framework that (i) evaluates matrix importance via spectral energy entropy, (ii) supports rank pruning and…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

The framework is clear and intuitive. The authors conduct many experiments on GLUE, commonsense reasoning, and the Visual Task Adaptation Benchmarks, demonstrating the effectiveness of the proposed method.

Weaknesses

1. The paper omits key system-level metrics such as peak GPU memory usage and FLOPs, which are essential for assessing efficiency. 2. The proposed entropy-guided importance metric is closely related to the effective rank concept from [1], limiting the novelty of the contribution. 3. The reported performance gains are marginal, particularly for large-scale models like LLaMA3-8B. Moreover, in Table 2, the number of trainable parameters in AdaLoRA is only about half that of the proposed model, maki

Reviewer 02Rating 6Confidence 3

Strengths

1. A simple, matrix-level entropy score guides where to prune and where to add rank, avoiding noisy element-wise heuristics. 2. True bidirectional allocation with zero-impact initialization lets the model expand capacity safely while staying within a budget. 3. Strong, cross-domain results (NLP and vision) with clear ablations show each component—entropy, bidirectionality, initialization—adds value.

Weaknesses

1. Entropy ignores magnitude (energy). The spectral‑entropy score is scale‑invariant: a matrix with very small Λ but uniform spread can have high entropy and thus be prioritized for expansion, even if its absolute contribution is negligible. The paper briefly compares against Frobenius/nuclear norms (Table 4), but does not explore combined criteria (e.g., entropy × energy) or energy‑gated expansion. This is a conceptual gap given the stated goal of measuring “importance.” 2. Baseline anomalies &

Reviewer 03Rating 6Confidence 4

Strengths

The work is well written and clear. The proposed method, despite being very simple, seems to be effective against a variety of different baselines.

Weaknesses

In the presentation of the algorithm, it is not clear which metric is precisely used to rank the importance of the single matrices. I would suggest that the authors include a more precise discussion about this in the revised version. The proposed method is not able to significantly reduce the number of trainable parameters with respect to other methods like AdaLoRA or GeoLoRA [1]. It is also not clear how the global budget is allocated and kept under the maximal one, as the effect of inflating

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning