Randomized Block-Diagonal Preconditioning for Parallel Learning
Celestine Mendler-D\"unner, Aurelien Lucchi

TL;DR
This paper introduces a randomized repartitioning technique for block-diagonal preconditioning in gradient-based optimization, significantly enhancing convergence speed in parallel machine learning tasks.
Contribution
It demonstrates that random repartitioning of coordinates improves convergence of block-diagonal preconditioned methods, supported by theoretical analysis and empirical validation.
Findings
Repartitioning leads to faster convergence in optimization.
Theoretical analysis quantifies expected convergence improvements.
Empirical results confirm efficiency gains on various tasks.
Abstract
We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form. Such a structural constraint comes with the advantage that the update computation is block-separable and can be parallelized across multiple independent tasks. Our main contribution is to demonstrate that the convergence of these methods can significantly be improved by a randomization technique which corresponds to repartitioning coordinates across tasks during the optimization procedure. We provide a theoretical analysis that accurately characterizes the expected convergence gains of repartitioning and validate our findings empirically on various traditional machine learning tasks. From an implementation perspective, block-separable models are well suited for parallelization and, when shared memory is available, randomization can be implemented on top of existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
