LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks
Dominik J. M\"uhlematter, Michelle Halbheer, Alexander Becker, Dominik Narnhofer, Helge Aasen, Konrad Schindler, Mehmet Ozgur Turkoglu

TL;DR
LoRA-Ensemble introduces a parameter-efficient implicit ensembling method for self-attention networks that improves uncertainty calibration and accuracy, reducing computational costs compared to explicit ensembles.
Contribution
It extends Low-Rank Adaptation (LoRA) into an implicit ensembling scheme for transformers, outperforming existing implicit methods and matching explicit ensemble performance.
Findings
Outperforms state-of-the-art implicit ensembling methods
Matches or exceeds explicit ensemble accuracy
Achieves superior calibration with lower computational cost
Abstract
Numerous real-world decisions rely on machine learning algorithms and require calibrated uncertainty estimates. However, modern methods often yield overconfident, uncalibrated predictions. The dominant approach to quantifying the uncertainty inherent in the model is to train an ensemble of separate predictors and measure their empirical variance. In an explicit implementation, the ensemble has a high computational cost and memory footprint, especially if the base model itself is already large, like modern transformers. This motivates efforts to develop implicit ensemble methods that emulate the ensemble without explicitly instantiating all its members. We introduce LoRA-Ensemble, a parameter-efficient ensembling method for self-attention networks. It is based on Low-Rank Adaptation (LoRA), originally developed for efficient LLM fine-tuning, and extends it into an implicit ensembling…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. Strong Empirical Coverage and Rigor: The authors evaluate on large-scale and cross-modal datasets (iNaturalist, ESC-50, SST-2) with clear error bars, runtime/memory analysis, and calibration metrics (ECE, NLL, Brier). 2. Clear and Practical Contribution – The paper demonstrates that LoRA modules can serve as lightweight ensemble “handles” for transformers. The design is simple, easy to integrate, and compatible with existing pre-trained models. 3. Clarity and Presentation – The manuscript is
1. Limited Theoretical Foundation – The work still lacks a rigorous theoretical analysis of why LoRA-based low-rank perturbations yield well-calibrated ensembles or maintain diversity. The paper remains empirical in nature. There are many ways to create an "ensemble of models" from a template one but it is unclear why multiple LoRA models create an useful ensemble. 2. Conceptual Overlap with BatchEnsemble – Although the authors includes BatchEnsemble comparisons and explanations, the underlyin
The paper uses the simple (and therefore potentially very widely applicable) idea to create ensemble members by fine-tuning a single pre-trained transformer model using LoRA and comprehensively tests it on a wide range of relevant datasets against a number of baselines. The experimental results are described in detail and look promising. Therefore, the proposed method could provide an effective, yet much more compute-efficient way of training ensembles of self-attention networks. Furthermore, th
1. The paper could partially profit from more clarity in the writing/presentation of the results. Concretely, for example, the first part of the 'experiments' section switched around from paragraph to paragraph between explaining the datasets used, the baselines evaluated against and the metrics used. Similarly, I felt like the explanation in the section on 'Enhanced Diversity In LoRA-Ensemble' was often switching around between various methods without having a clear storyline, which made it har
well done study and a well written paper The experimental results are actually pretty detailed and well developed.
- Use of an ensemble to generate uncertainty estimate seem to be counter intuitive to the idea of why LORA is used. That is, to reduce the number of parameters that need to be trained. Replacing this with an ensemble just seems to spoil that idea. That being said, it is still going to be better than creating an ensemble with the original parameter space. - The insights into what and why LORA based uncertainty estimates is useful and why it provides notions of uncertainty is actually hidden i
* The paper is well written. * The idea of LoRA-Ensemble is straightforward to use.
* The novelty of their method is limited. LoRA-Ensemble is very close to prior weight-sharing ensembles (esp. BatchEnsemble). The paper’s main distinction is additive low-rank updates vs. BatchEnsemble’s multiplicative rank-1 modulation. While useful, this reads incremental and largely engineering-driven; there is no principled account that predicts when/why additive low-rank adapters should dominate multiplicative modulations for accuracy/calibration. * The experiments in the paper focus on sma
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
