DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
Selcuk Gurses, Aozhong Zhang, Yanxia Deng, Xun Dong, Xin Li, Naigang Wang, Penghang Yin, Zi Yang

TL;DR
DiaBlo introduces a parameter-efficient fine-tuning method that updates only diagonal blocks of model weights, achieving competitive performance with improved stability and simplicity over existing approaches.
Contribution
DiaBlo is a novel PEFT approach that updates only diagonal blocks, avoiding low-rank matrix products and enhancing convergence stability and expressiveness.
Findings
Achieves competitive accuracy across diverse tasks.
Maintains high memory efficiency and fast training speed.
Provides theoretical guarantees of improved expressiveness.
Abstract
Fine-tuning is a critical step for adapting large language models (LLMs) to domain-specific downstream tasks. To mitigate the substantial computational and memory costs of full-model fine-tuning, Parameter-Efficient Fine-Tuning (PEFT) methods have been proposed to update only a small subset of model parameters. However, performance gaps between PEFT approaches and full-model fine-tuning still exist. In this work, we present DiaBlo, a simple yet effective PEFT approach that updates only the diagonal blocks of selected model weight matrices. Unlike Low-Rank Adaptation (LoRA) and its variants, DiaBlo eliminates the need for low-rank matrix products, thereby avoiding the reliance on auxiliary initialization schemes or customized optimization strategies to improve convergence. This design leads to stable and robust convergence while maintaining comparable memory efficiency and training speed…
Peer Reviews
Decision·ICLR 2026 Poster
- The idea is quite elegant, relatively simple to implement and efficient to train -- there isn't much adaptation required to existing finetuning libraries to get this working. - The results are broad (covering standard PEFT benchmarks) and thus convincing. - Ablations cover the first questions I had regarding whether block-diagonal is better than other ways of selecting entries to tune in the weight matrix; it does seem like it is broadly a better strategy than other ideas.
- Since we use a standard suite of benchmarks to evaluate PEFTs, it's possible that our literature is engaging in test-set overfitting (compare how the ImageNet challenge or LMSYS arena were overfit by organisations submitting many models repeatedly). It would thus be nice to show how the technique performs under varying learning rates and block sizes (e.g. as done for LoRA in [Schulman et al. (2025)](https://thinkingmachines.ai/blog/lora/)). It is nice though that there are not as many hyperpar
- The proposed work eliminates the inherent optimization difficulties associated with low-rank decomposition by avoiding the use of matrix products. - DiaBlo demonstrates higher stability in 4-bit and 2-bit arithmetic reasoning tasks.
- Compared to strong baselines like SMT with similar trainable parameter amount, the proposed method does not show significantly better performance. In other words, the paper argues the memory and computation efficiency of the proposed method, but the model doesn’t achieve significant improved performance compared to baselines when they share the same amount of trainable parameters. - In table 1, it shows DiaBlo N =128 doesn’t get better performance compared to DiaBlo N =64 although doubled tra
1. By removing the complexity of low-rank structures, this work presents a clear alternative to LoRA-style PEFT. The results show that DiaBlo attains comparable performance without the added overhead of extra trainable matrices, simplifying both tuning and optimization. 2. The evaluation spans diverse supervised fine-tuning tasks -- including code generation, arithmetic reasoning, and commonsense reasoning -- covering a balanced range of short to moderate sequence lengths. 3. The results in Ta
1. Most evaluated benchmarks involve short output sequences, except for code generation. Testing DiaBlo on tasks with longer input–output contexts would better demonstrate its scalability and performance stability under extended sequence conditions (see q1). 2. The discussion of sparsity-based PEFT methods misses some recent relevant work, such as S2FT (NeurIPS 2025)[1] and SparseLoRA (ICML 2025)[2]. Including these would strengthen the discussion on sparsity in the introduction and would provi
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
