Rolling Ball Optimizer: Learning by ironing out loss landscape wrinkles
Mohammed Djameleddine Belgoumri, Mohamed Reda Bouadjenek, Hakim Hacid, Imran Razzak, Sunil Aryal

TL;DR
The paper introduces the Rolling Ball Optimizer (RBO), a novel method that smooths the loss landscape by simulating a sphere rolling over it, improving optimization and generalization in neural network training.
Contribution
It proposes a new optimization algorithm that incorporates large-scale landscape information, providing a smoothing effect and better handling of complex loss geometries.
Findings
RBO demonstrates faster convergence compared to SGD, SAM, and Entropy-SGD.
RBO achieves improved training accuracy on MNIST and CIFAR datasets.
RBO enhances generalization performance in neural network training.
Abstract
Training large neural networks (NNs) requires optimizing high-dimensional data-dependent loss functions. The optimization landscape of these functions is often highly complex and textured, even fractal-like, with many spurious local minima, ill-conditioned valleys, degenerate points, and saddle points. Complicating things further is the fact that these landscape characteristics are a function of the data, meaning that noise in the training data can propagate forward and give rise to unrepresentative small-scale geometry. This poses a difficulty for gradient-based optimization methods, which rely on local geometry to compute updates and are, therefore, vulnerable to being derailed by noisy data. In practice,this translates to a strong dependence of the optimization dynamics on the noise in the data, i.e., poor generalization performance. To remediate this problem, we propose a new…
Peer Reviews
Decision·Submitted to ICLR 2026
The idea, or especially its implementation, seem novel and yet intuitive. The explanations for why it might work also seem to pass muster (learning rate and the radius phase transition).
Some of the motivation in the abstract and intro feels like overselling the problem, e.g., for a while it was believed that local minima might simply not exist in neural networks; see https://arxiv.org/pdf/1910.00359. I would like to know _how_ much more computationally expensive this is; my intuition about doing the projections says "much more than SAM", which doesn't bode too well given they were trading blows in Table 1. In any case, that concern makes me want for some compute-matched experi
- Originality (concept): Replaces point‑particle dynamics with finite‑radius body dynamics; non‑locality emerges from a projection onto the graph’s offset. This is a clean, physically motivated design space distinct from SAM/Entropy‑SGD. Fig. 2 (p. 4) compellingly visualizes multi‑scale smoothing as $\rho$ increases. - Quality (math framing): The offset‑manifold viewpoint and the weak/linear ironing results formalize the smoothing intuition; the unreachability proposition links sharpness to c
1. **Metric & scaling are not specified or analyzed.** The projection minimizes Euclidean distance in $\mathbb{R}^{d+1}$ between $\tilde c_{t+1}$ and points on the graph $\{(\theta,f(\theta))\}$ (Eq. (3)), which implicitly equates horizontal parameter units and the vertical loss scale. Without a scaling parameter$\lambda$ to balance $\|\theta-\theta_e\|^2 + \lambda^2 (f(\theta)-y_e)^2$, behavior can change drastically under simple transformations (e.g., multiplying the loss by a constant) or par
- This paper proposes a novel optimization method with intuitive idea of "rolling ball" which can be beneficial not only for optimization and also for generalization.
- The crucial weakness is the experiment parts. - ResNet-6 and VGG-9 are too small. It would be much better if it is scalable to larger neural networks (e.g., WRN-28-10). - The accuracy reported in Table 1 is very far from the state-of-the-arts. It doesn't need to be the state-of-the-arts, but at least, CIFAR-10 performance should be over/around 90% (SAM achieved >97% performance according to the SAM paper). It's unclear whether the hyperparameters of SAM are well-tuned for the small neu
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Locomotion and Control
