GeoLoRA: Geometric integration for parameter efficient fine-tuning
Steffen Schotth\"ofer, Emanuele Zangrando, Gianluca Ceruti, Francesco, Tudisco, Jonas Kusch

TL;DR
GeoLoRA introduces a geometric, dynamical low-rank approximation method for parameter-efficient fine-tuning of neural networks, significantly reducing computational costs while maintaining robustness and theoretical guarantees.
Contribution
It presents GeoLoRA, a novel dynamical low-rank approach that improves efficiency, adaptivity, and robustness in fine-tuning large neural networks, with theoretical and empirical validation.
Findings
GeoLoRA reduces fine-tuning computational cost by requiring only one backpropagation pass.
It achieves smaller low-rank adapters compared to heuristic methods like AdaLoRA.
GeoLoRA outperforms existing methods in accuracy and efficiency on benchmarks.
Abstract
Low-Rank Adaptation (LoRA) has become a widely used method for parameter-efficient fine-tuning of large-scale, pre-trained neural networks. However, LoRA and its extensions face several challenges, including the need for rank adaptivity, robustness, and computational efficiency during the fine-tuning process. We introduce GeoLoRA, a novel approach that addresses these limitations by leveraging dynamical low-rank approximation theory. GeoLoRA requires only a single backpropagation pass over the small-rank adapters, significantly reducing computational cost as compared to similar dynamical low-rank training methods and making it faster than popular baselines such as AdaLoRA. This allows GeoLoRA to efficiently adapt the allocated parameter budget across the model, achieving smaller low-rank adapters compared to heuristic methods like AdaLoRA and LoRA, while maintaining critical…
Peer Reviews
Decision·ICLR 2025 Poster
- [Major] Easy to follow, well-written - [Major] The intuition is straightforward, and the method is simple yet effective - [Major] The proposed method is supported by many evidence, including theoretical convergence guarantee and error analysis, and various experiment results. - [Major] The experiments are diverse, including both language tasks and vision tasks.
- [Major] While the figure between L222 and L232 exhibit great performance of GeoLoRA, the improvement caused by GeoLoRA seems to be marginal, especially in Table 2. - [Major] Some experiment settings are not well-justified. For instance, in Table 2, the authors report the experiment results of GeoLoRA individually, while for others, it seems not. It looks like an unfair comparison, and the reason why GeoLoRA are reported individually should be justified. For Table 3, the results of LoRA are mis
- This work provides a novel way of solving the stiffness/convergence issues in existing LoRA solvers, finding the LoRA factors at around the same rate as full fine-tuning. - GeoLoRA, while being more efficient than standard LoRA training, also includes adaptive rank finding, which is the most common alternative AdaLoRA has been shown to do but at a significantly greater cost than GeoLoRA. Finding LoRA-like fine-tuning factors can be done without specifying the rank a prior at nearly the same i
- For clarity, the motivation (Section 3) is currently a little lengthy. When reading, there was some expectation that one of these major derivations was part of the GeoLoRA algorithm, and it only became clear that the algorithm's description happened a few pages later. Including either a more complete description of Section 3 at its beginning, another subsection title, or moving some of Section 3 to the appendix/another section would help with readability.
- The paper is well-written and the technique is overall well-motivated. - The theoretical support for the proposed method is solid -- the majority of the theoretical results are built on the single-layer case, but Proposition 1 extends this to the general multilayer case. If I understand this result correctly, this implies that GeoLoRA can obtain the optimal rank configuration for arbitrary architectures -- this seems like a particularly strong result to me. - GeoLoRA appears to robustly outp
- This paper could benefit from more discussion of the details of existing methods, and why the best adaptive methods require multiple gradient tapes and what types of guarantees they have. This information is currently summarized in a table. - In Table 2, it appears that the same rank allocation configuration is used for every dataset, for each of the baselines, whereas the parameter count of the proposed method varies for different datasets. Can the authors comment on this? - In Section 4.1,
Videos
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · Robotics and Sensor-Based Localization · Robotic Mechanisms and Dynamics
