Emergent universal long-range structure in random-organizing systems

Satyam Anand; Guanming Zhang; Stefano Martiniani

PMC · DOI:10.1038/s41467-026-68601-2·January 23, 2026

Emergent universal long-range structure in random-organizing systems

Satyam Anand, Guanming Zhang, Stefano Martiniani

PDF

Open Access

TL;DR

The paper shows that noise can lead to large-scale order in systems from physics and machine learning, revealing a universal self-organization mechanism.

Contribution

The study uncovers a universal long-range structure across diverse systems and links it to noise correlation and hyperuniformity.

Findings

01

Three distinct systems exhibit universal long-range behavior governed by particle noise correlation.

02

Stochastic gradient descent favors flat energy landscapes, similar to hyperuniform material self-assembly.

03

A fluctuating hydrodynamic theory explains the emergence of long-range order across systems.

Abstract

Self-organization through noisy interactions is ubiquitous across physics, mathematics, and machine learning, yet how long-range structure emerges from local noisy dynamics remains poorly understood. Here, we investigate three paradigmatic random-organizing particle systems drawn from distinct domains: models from soft matter physics (random organization, biased random organization) and machine learning (stochastic gradient descent), each characterized by distinct sources of noise. We discover universal long-range behavior across all systems, namely the suppression of long-range density fluctuations, governed solely by the noise correlation between particles. Furthermore, we establish a connection between the emergence of long-range structure and the tendency of stochastic gradient descent to favor flat regions of energy landscape—a phenomenon widely observed in machine learning. To…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

SGCD

Proteins1

Chemicals1

BRO

Diseases1

RO

Figures3

Click any figure to enlarge with its caption.

Random-organizing systems.a Absorbing-to-active phase transition. The steady-state fraction of active (overlapping) particles (*f*_*a*_) plotted as a function of normalized volume fraction *ϕ*/*ϕ*_*c*_, where *ϕ*_*c*_ is the critical volume fraction. Starting from an initial random configuration, at time *t* → *∞*, in the absorbing state (*ϕ*/*ϕ*_*c*_ < 1), *f*_*a*_ = 0, while in the active state (*ϕ*/*ϕ*_*c*_ > 1), *f*_*a*_ > 0. Insets show zoomed in exemplar configurations of discrete-time BRO, and *L* is the side length of the simulation box. Blue particles have no overlapping neighbors whe

Universal long-range structure in random-organizing systems.a Coarse-grained density fluctuations *δρ*(*c*)/∣*δρ*_avg_(*c* = 0)∣ in random-organizing systems, where *c* is the pairwise noise correlation, and *δρ*_avg_ denotes average density fluctuations over the whole system. The three panels denote *c* = − 1, *c* = − 0.75, *c* = 0 (left to right) for each system. A Gaussian kernel of width 100*R* was chosen for coarse-graining all systems, where *R* is the particle radius. As the pairwise noise correlation *c* goes from 0 (uncorrelated) to − 1 (anti-correlated), the density fluctuations are

Flatness of energy landscape in stochastic gradient descent (SGD).a Zoomed-in exemplar configuration of discrete-time SGD before and after adding a small noise to all particles (Top). Schematic of a system (black ball) in a steady-state configuration. The system has energy *E* at the steady-state configuration X, and energy *E* + *ΔE* at a slightly perturbed position X + *Δ*X after the addition of a small noise N. b Normalized energy change $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsf

Keywords

Statistical physics, thermodynamics and nonlinear dynamicsNonlinear phenomena

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMicro and Nano Robotics · Machine Learning in Materials Science · Quantum many-body systems

Full text

Introduction

While typically associated with disorder, noise can paradoxically drive the emergence of diverse forms of order, such as pattern formation^1^, self-organization^2,3^, suppression of chaos^4^, selection of ordered states^5^, and swarming^6^. Physical systems exhibit a broad spectrum of order: perfect crystals and ideal gases mark the extremes, while intermediate regimes display correlated disorder, such as hyperuniformity—where local disorder coexists with the anomalous suppression of long-range density fluctuations^7^. Hyperuniformity can emerge either at criticality^8–10^, or away from it^11–14^. Away from criticality, in equilibrium, hyperuniformity requires long-range interactions^7^, whereas out of equilibrium, it can emerge from long- or short-range, and even noisy interactions^11–16^. The process by which long-range spatial structure develops away from criticality—particularly in systems interacting solely via short-range, noisy dynamics—is a long-standing question that remains poorly understood.

Non-equilibrium particle systems with short-range noisy interactions—such as random organization (RO)^2,9,17–20^, biased random organization (BRO)^10,12,14,19,21,22^, and stochastic gradient descent (SGD)^23^—provide an ideal framework to investigate the noise-driven emergence of long-range spatial structure. These systems have been studied in a wide variety of contexts, such as sheared colloidal suspensions^2,9,24,25^, random close packing^10,21^, two-dimensional crystallization^14^, and self-supervised learning^23,26^. All systems undergo a phase transition as the particle volume fraction increases; from a low-density state where all motion ceases (absorbing state), to a high-density state where motion persists forever (active state)^27–29^. Irrespective of microscopic details, RO, BRO, and SGD belong to the same universality class, i.e., display the same critical behavior^10,20,23^. Away from criticality in the active phase, however, variations in microscopic interactions significantly influence the emergent long-range structure^12,14,17,18^. Despite extensive experimental, numerical, and theoretical research over two decades, a quantitative microscopic understanding of dynamics and structure far from criticality remains elusive^2,9,10,12,14,17–19,21–24^. Fundamental questions remain unanswered: How does macroscopic structure emerge from noisy interactions? Moreover, what universal principles govern the variability in emergent structures within and across different random-organizing systems? Finally, is the emergent long-range structure in SGD related to its ability to discover flat regions of energy landscape—a feature linked to robust generalization in machine learning^30^?

Here, combining particle and continuum simulations with hydrodynamic theory, we provide a quantitative, microscopic understanding of the active phases of RO, BRO, and SGD—random-organizing systems with distinct sources of noise. We discover universal long-range behavior across all three systems governed by a single parameter: the noise correlation coefficient between particles. All systems self-organize to suppress density fluctuations below a crossover length scale, which diverges as the noise becomes anti-correlated (reciprocal interactions), resulting in strong (class I^7^) hyperuniformity. Further, by directly coarse-graining the microscopic dynamics, we develop a fluctuating hydrodynamic theory for random-organizing systems that quantitatively predicts both the emergence of long-range structure and the crossover length scale across all systems. Finally, we demonstrate how noise correlation, batch size, and learning rate bias SGD towards flatter regions of energy landscape—a finding that aligns with empirical observations in machine learning^30–33^—and establish a connection between the emergence of long-range structure and the ability of SGD to generalize effectively. Our study underscores the critical role of noise correlations in facilitating long-range structure, and has wide-ranging applications—from providing a robust method for self-assembling hyperuniform structures, to offering insights into other systems having correlated noise, such as neural population activity in the brain^34^, ecological population dynamics^35^, and gene expression dynamics in cells^36^.

Results

Setup

Random-organizing systems—RO, BRO, and SGD—are discrete-time systems consisting of N interacting spherical particles of radius R in d-dimensional space. At any given time-step, the positions of isolated particles (blue in Fig. 1b–d) do not evolve, and those of overlapping (active) particles (red in Fig. 1b–d) evolve according to system specific rules designed to resolve particle overlaps. All systems undergo an absorbing-to-active phase transition as the particle volume fraction ϕ increases (Fig. 1a). Starting from a random initial configuration, for ϕ < ϕc, the system finds an absorbing configuration (fa = 0), whereas for ϕ > ϕc, the system never finds such a configuration and reaches a non-equilibrium steady state (fa > 0) (Fig. 1a). Here, fa is the fraction of active particles in the system. We report all our results in the active phase when the system has reached steady state (Methods).Fig. 1. Random-organizing systems.a Absorbing-to-active phase transition. The steady-state fraction of active (overlapping) particles (fa) plotted as a function of normalized volume fraction ϕ/ϕc, where ϕc is the critical volume fraction. Starting from an initial random configuration, at time t → ∞, in the absorbing state (ϕ/ϕc < 1), fa = 0, while in the active state (ϕ/ϕc > 1), fa > 0. Insets show zoomed in exemplar configurations of discrete-time BRO, and L is the side length of the simulation box. Blue particles have no overlapping neighbors whereas red particles have at least one overlapping neighbor. Schematics of discrete-time RO (b) (Eq. (1)), BRO (c) (Eq. (2)), SGD (d) (Eq. (3)), and the corresponding continuous-time approximation (generalized model, Eq. (5)) of RO (e), BRO (f), and SGD (g). Solid gray colors denote noise, while solid black colors denote deterministic interactions. Dashed black lines connect the centers of a pair of overlapping particles. Crosses in panel (d) denote unselected particles. For RO, the direction and the magnitude of the kicks are both noisy; for BRO, only the magnitude of the kicks are noisy; while for SGD, only the selection of active particles is noisy. Notice that in the generalized model (Eq. (5)), the selection noise of SGD is approximated by a noise in the magnitude of the kicks.

Random organization

RO was originally introduced to model experiments on sheared colloidal suspensions at high Péclet number^2,24^. Consider a system undergoing periodic shear cycles. After every shear cycle, overlapping particles are given a “kick” in a random direction, and non-overlapping particles return to their original position^2,9^. RO was subsequently simplified to a model without external shearing, which retains all the essential properties of the original version^17–19,29^. Here, we study this simpler isotropic version of RO^17–19,29^.

In RO, the dynamics of the position of particle i at time-step m + 1 ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\bf{x}}}}}_{i}^{m+1}$$\end{document}$ ) is given by,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\bf{x}}}}}_{i}^{m+1}={{{{\bf{x}}}}}_{i}^{m}+\epsilon {\sum }_{j\in {\Gamma }_{i}^{m}}{u}_{ji}^{m}{{{{\boldsymbol{\zeta }}}}}_{ji}^{m},$$\end{document}

where ϵ controls the magnitude of the pairwise kick given by particle j to i, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${u}_{ji}^{m}$$\end{document}$ is a random number sampled from a standard uniform distribution (U[0, 1]) at time-step m, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\boldsymbol{\zeta }}}}}_{ji}^{m}$$\end{document}$ is a random unit vector sampled uniformly on the surface of a d-dimensional unit hypersphere at time-step m, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Gamma }_{i}^{m}=\{j \ | \ | {{{{\bf{x}}}}}_{j}^{m}-{{{{\bf{x}}}}}_{i}^{m}| < 2R,\,j\, \ne \, i\}$$\end{document}$ is the set containing all particles that overlap with particle i at time-step m (Fig. 1b). We set $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\boldsymbol{\zeta }}}}}_{ji}^{m}=-{{{{\boldsymbol{\zeta }}}}}_{ij}^{m}$$\end{document}$ so that particles effectively move away from (“repel”) each other. Finally, c ∈ [ − 1, 0] is the Pearson correlation coefficient between corresponding components of the complete noise vectors $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\omega }_{ij,\alpha }^{m}=\epsilon {u}_{ij}^{m}{\zeta }_{ij,\alpha }^{m}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\omega }_{ji,\alpha }^{m}$$\end{document}$ , defined by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\langle {\omega }_{ij,\alpha }^{m}\,{\omega }_{ji,\beta }^{m}\rangle /\langle {\omega }_{ij,\alpha }^{m}\rangle \langle {\omega }_{ji,\beta }^{m}\rangle=c\,{\delta }_{\alpha \beta }$$\end{document}$ . c = 0 denotes pairwise kicks which are uncorrelated, and c = − 1 denotes pairwise kicks which are anti-correlated (equal in magnitude, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${u}_{ij}^{m}={u}_{ji}^{m}$$\end{document}$ , and opposite in direction, i.e., conserve pairwise center of mass).

Biased random organization

BRO was originally introduced to study random close packing of spheres^10,21^. Similar to RO, overlapping particles are given a kick, however, the kick is “biased” along the direction of the line joining the centers of overlapping particles (Fig. 1c). An interesting feature of BRO is that for ϵ → 0, its critical point was proposed as an alternative definition of random close packing^10,21^, which, despite decades of research, still lacks a clear definition^37–41^.

In BRO, the dynamics of the position of particle i at time-step m + 1 ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\bf{x}}}}}_{i}^{m+1}$$\end{document}$ ) is given by,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\bf{x}}}}}_{i}^{m+1}={{{{\bf{x}}}}}_{i}^{m}+\epsilon {\sum }_{j\in {\Gamma }_{i}^{m}}{u}_{ji}^{m}{\widehat{{{{\bf{x}}}}}}_{ji}^{m},$$\end{document}

where ϵ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${u}_{ji}^{m}$$\end{document}$ are defined as in RO, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{{{{\bf{x}}}}}}_{ji}^{m}=-({{{{\bf{x}}}}}_{j}^{m}-{{{{\bf{x}}}}}_{i}^{m})/| {{{{\bf{x}}}}}_{j}^{m}-{{{{\bf{x}}}}}_{i}^{m}|$$\end{document}$ is the deterministic unit vector pointing from the center of particle j to i at time-step m (Fig. 1c). c ∈ [ − 1, 0] is the Pearson correlation coefficient between corresponding components of the complete noise vectors $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\omega }_{ij,\alpha }^{m}=\epsilon {u}_{ij}^{m}{\widehat{x}}_{ij,\alpha }^{m}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\omega }_{ji,\alpha }^{m}$$\end{document}$ . Notice that c = − 1 corresponds to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${u}_{ij}^{m}={u}_{ji}^{m}$$\end{document}$ , since $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{{{{\bf{x}}}}}}_{ji}^{m}=-{\widehat{{{{\bf{x}}}}}}_{ij}^{m}$$\end{document}$ .

Stochastic gradient descent

SGD is a widely used optimization algorithm, e.g., in artificial neural networks, to minimize a loss function composed of a sum of many terms^42^. Stochasticity in SGD comes from the random selection of a subset of terms in the sum at every step. In the context of interacting particle systems, we take the loss to be the total energy E = ∑i∑j≥iV(xi, xj), where V(xi, xj) is any pairwise potential. SGD then corresponds to randomly selecting a subset of terms in E and updating the corresponding particle positions—either one or both at once—to minimize the partial energy. This is in contrast to simultaneously moving all active particles, which corresponds to (noiseless) gradient descent^23^. Notice that our approach of SGD applied on particle systems is unlike previous works relating neural networks to particle systems by treating parameters (weights) as interacting particles^43^.

In SGD, the dynamics of the position of particle i at time-step m + 1 ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\bf{x}}}}}_{i}^{m+1}$$\end{document}$ ) is given by,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\bf{x}}}}}_{i}^{m+1}={{{{\bf{x}}}}}_{i}^{m}-\alpha {\sum }_{j\in {\Gamma }_{i}^{m}}{\theta }_{ji}^{m}{\nabla }_{i}{V}_{ji}^{m},$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\nabla }_{i}={\nabla }_{{{{{\bf{x}}}}}_{i}}$$\end{document}$ , α is the learning rate having units of length/force, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${V}_{ji}^{m}=V(| {{{{\bf{x}}}}}_{j}^{m}-{{{{\bf{x}}}}}_{i}^{m}| )$$\end{document}$ is the pairwise interaction potential between particles i and j, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\theta }_{ji}^{m}$$\end{document}$ is a random number sampled from a Bernoulli distribution having parameter bf (batch fraction) at time-step m (Fig. 1d). bf represents the average fraction of active particle pairs (i, j) that move at any given time-step. c ∈ [ − 1, 0] is the Pearson correlation coefficient between corresponding components of the complete noise vectors $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\omega }_{ij,\alpha }^{m}=-\alpha {\theta }_{ij}^{m}{\partial }_{i,\alpha }{V}_{ij}^{m}$$\end{document}$ and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\omega }_{ji,\alpha }^{m}$$\end{document}$ originating from the pairwise correlated selection noise $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\theta }_{ij}^{m}$$\end{document}$ . Notice that since $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\nabla }_{i}{V}_{ji}^{m}=-{\nabla }_{j}{V}_{ji}^{m}$$\end{document}$ , c = − 1 corresponds to $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\theta }_{ij}^{m}={\theta }_{ji}^{m}$$\end{document}$ . While Vj**i can be any short- or long-range potential in SGD, here, we consider a class of short-range, repulsive potentials given by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${V}_{ij}(r)=\left\{\begin{array}{ll}\frac{{{{\mathcal{E}}}}}{p}{\left(1-\frac{{r}_{ij}}{2R}\right)}^{p},& {{{\rm{if}}}}\,0 < {r}_{ij} < 2R,\\ 0,& {{{\rm{otherwise}}}},\end{array}\right.$$\end{document}

where ri**j = ∣xj − xi∣, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\mathcal{E}}}}$$\end{document}$ is the characteristic energy scale, and p controls the stiffness of the potential.

Universal active phase behavior

There are three distinct sources of noise in random-organizing systems: magnitude of kicks, direction of kicks, and selection of particles. Notice that the origin of stochasticity in RO, BRO, and SGD is different; (i) for RO, the magnitude and direction of kicks are both noisy, while the selection of particles is deterministic, (ii) for BRO, the magnitude of kicks is noisy, while both the direction of kicks and selection of particles is deterministic, (iii) for SGD, the selection of particles is noisy, while the magnitude and direction of kicks are both deterministic (Fig. 1b–d, Eqns. (1), (2), and (3)). We perform particle simulations for RO, BRO, and SGD in the active phase (ϕ > ϕc) and measure the long-range structure, quantified by the radially averaged structure factor S(k) and variance in number density δ**ρ^2^(l) (Methods). l is the diameter of the hypershpere used for measuring density fluctuations and k = 2π/l is the wave number. We study all properties above a threshold length scale l0 = 2π/k0, below which we find system-specific short-range behavior (Fig. 2b, c). Hereafter, we work with normalized quantities: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{l}$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{k}$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{S}(\widetilde{k})$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\delta \rho }}^{2}(\widetilde{l})$$\end{document}$ (see Fig. 2 caption for definitions).Fig. 2. Universal long-range structure in random-organizing systems.a Coarse-grained density fluctuations δ**ρ(c)/∣δ**ρavg(c = 0)∣ in random-organizing systems, where c is the pairwise noise correlation, and δ**ρavg denotes average density fluctuations over the whole system. The three panels denote c = − 1, c = − 0.75, c = 0 (left to right) for each system. A Gaussian kernel of width 100R was chosen for coarse-graining all systems, where R is the particle radius. As the pairwise noise correlation c goes from 0 (uncorrelated) to − 1 (anti-correlated), the density fluctuations are suppressed for all systems. b Normalized radially averaged structure factor $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{S}(\widetilde{k})$$\end{document}$ versus normalized wave number $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{k}$$\end{document}$ for RO, BRO, and SGD (left to right). $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{S}=S(k)/{S}_{0}(2\pi /L)$$\end{document}$ where S0(2π/L) is the structure factor for c = 0 at k = 2π/L, and L is the side length of the simulation box. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{k}=k/{k}_{0}$$\end{document}$ , where k0 is the value at which $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{S}({k}_{0})=1$$\end{document}$ for anti-correlated noise (c = − 1) of the same system. Solid black lines show predictions of Eq. (10) for different values of c. Inset shows the normalized crossover length scale ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${l}_{c}/{l}_{0}={\widetilde{l}}_{c}=1/{\widetilde{k}}_{c}$$\end{document}$ ) versus c. We find the normalized crossover wavenumber ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{k}}_{c}$$\end{document}$ ) in simulations as the intersection, on a log-log plot, between a fit of slope 0 near the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{k}\to 0$$\end{document}$ region, and a fit of slope 2 near the $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{k}\approx 1$$\end{document}$ region. Solid black line in the inset shows prediction of Eq. (11). Gray shaded regions denote short-range behavior ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{k} > 1$$\end{document}$ ). c Normalized variance of number density $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\delta \rho }}^{2}(\widetilde{l})$$\end{document}$ versus normalized diameter of the hypersphere ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{l}$$\end{document}$ ) used for measuring density fluctuations for RO, BRO, and SGD (left to right). $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\delta \rho }}^{2}(\widetilde{l})=\delta {\rho }^{2}(l)/\delta {\rho }^{2}({l}_{0})$$\end{document}$ where δ**ρ^2^(l0) is the density variance for c = 0 at l = l0, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{l}=l/{l}_{0}$$\end{document}$ , where l0 = 2π/k0. Bottom inset shows data collapse of density variances when $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{l}$$\end{document}$ is rescaled by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{l}}_{c}$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\delta \rho }}^{2}(\widetilde{l})$$\end{document}$ is rescaled by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\delta \rho }}^{2}({\widetilde{l}}_{c})$$\end{document}$ . Top inset shows infinite wavelength density fluctuations $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{\delta \rho }}^{2}(\widetilde{l}\to \infty )$$\end{document}$ versus c. $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widehat{\delta \rho }}^{2}(c)=[{\widetilde{\delta \rho }}^{2}(c)-{\widetilde{\delta \rho }}^{2}(c=-1)]/{\widetilde{\delta \rho }}^{2}(c=0)$$\end{document}$ . Solid black line denotes 1 + c (prediction of Eq. (10) in the limit $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{k}\to 0$$\end{document}$ ). Gray shaded regions denote short-range behavior ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{l} < 1$$\end{document}$ ).

Despite having microscopically different dynamics, all random-organizing systems display a universal long-range structure, controlled solely by the pairwise noise correlation c, and qualitatively independent of all other parameters—be it the kick magnitude ϵ, volume fraction ϕ, or spatial dimension d in RO and BRO, or ϕ, d, learning rate α, batch fraction bf, potential stiffness p, and energy scale $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\mathcal{E}}}}$$\end{document}$ in SGD. (Fig. 2a, b, c, Supplementary Information (SI) Figs. S2, S3, S4). Remarkably, the active phase behavior across all systems is independent of d, in contrast to their critical behavior, which is heavily dependent on d ^9,10,23^ (Figs. S2c, S3c, S4c). All systems self-organize to suppress density fluctuations below a normalized crossover length scale $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{l}}_{c}$$\end{document}$ ; specifically, for length scales $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 < \widetilde{l} < {\widetilde{l}}_{c}$$\end{document}$ , the structure factor follows a power law ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{S} \sim {\widetilde{k}}^{2}$$\end{document}$ ) and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\delta \rho }}^{2} \sim {\widetilde{l}}^{-(d+1)}$$\end{document}$ , whereas for $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{l} > {\widetilde{l}}_{c}$$\end{document}$ , the structure factor is constant, ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{S} \sim {{{\rm{const.}}}}$$\end{document}$ ) and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\delta \rho }}^{2} \sim {\widetilde{l}}^{-d}$$\end{document}$ (Fig. 2b, c). Further, the crossover length scale $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{l}}_{c}$$\end{document}$ increases monotonically as c decreases: as the pairwise noise becomes more negatively correlated, density fluctuations are suppressed up to larger length scales (Fig. 2b inset). Consequently, the infinite wavelength density fluctuations, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\delta \rho }}^{2}(\widetilde{l}\to \infty )\propto \widetilde{S}(\widetilde{k}\to 0)$$\end{document}$ , decrease monotonically as c decreases (Fig. 2c inset). Finally, when the noise is anti-correlated (c = − 1), the crossover length scale $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{{l}_{c}}\to \infty$$\end{document}$ and the system becomes strongly hyperuniform, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{S}(\widetilde{k}\to 0) \sim {\widetilde{k}}^{2}$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\delta \rho }}^{2}(\widetilde{l}\to \infty ) \sim {\widetilde{l}}^{-(d+1)}$$\end{document}$ (Fig. 2b, c).

Our results are consistent with previous studies on RO, which focused on the specific case of uncorrelated noise (c = 0)^17,18^, and on BRO, which focused on the specific case of anti-correlated noise (c = − 1)^12,14^. So, why do microscopically distinct systems—RO, BRO, and SGD—exhibit universal long-range behavior?

Generalized model

We now develop a continuous-time model of discrete-time random-organizing systems. Using the framework of stochastic modified equations^23,44^, we approximate the discrete-time dynamics by a continuous-time stochastic differential equation (SDE) (SI Sec. I.A). In the resulting genralized model, the dynamics of the position of particle i (xi) is given by an overdamped Langevin equation,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{d{{{{\bf{x}}}}}_{i}(t)}{dt}=-\frac{1}{\gamma }{\sum }_{j=1}^{N}{\nabla }_{i}{V}_{ji}+{\sum }_{j=1}^{N}\sqrt{{{{{\boldsymbol{\Lambda }}}}}_{ji}}\cdot {{{{\boldsymbol{\xi }}}}}_{ji},$$\end{document}

where γ is the friction constant, Vj**i, Λj**i are short-range, pairwise interaction potential and diffusion matrix between particles j and i, respectively, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{{{{{\boldsymbol{\Lambda }}}}}_{ji}}$$\end{document}$ denotes a matrix square root, and Eq. (5) is interpreted in the Itô sense. ξj**i is a pairwise, Gaussian noise given by the particle j to particle i having mean 〈ξj**i,α(t)〉 = 0 and covariance matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\langle {\xi }_{ij,\alpha }(t)\,{\xi }_{kl,\beta }({t}^{{\prime} })\rangle=\delta (t-{t}^{{\prime} })\,{\delta }_{\alpha \beta }({\delta }_{ik}{\delta }_{jl}+c\,{\delta }_{il}{\delta }_{jk})$$\end{document}$ , where c ∈ [ − 1, 0] is the Pearson correlation coefficient between ξi**j,α(t) and ξj**i,α(t). Eq. (5), supplemented with system-specific γ, Vi**j and Λi**j, is an SDE approximating RO, BRO, and SGD (SI Sec. I.A, Table S1). Notice that in the generalized model, the source(s) of noise for RO and BRO remain the same, while the selection noise in SGD becomes a noise on the magnitude of the kicks (Fig. 1e–g, SI Sec. I.A)^23^.

We perform particle simulations of the generalized model for RO, BRO, and SGD and find that the long-range structure is quantitatively the same as that for their discrete-time counterparts (Methods, Fig. 2b, c). Thus, the generalized model serves as an accurate continuous-time approximation for all random-organizing systems.

Fluctuating hydrodynamic theory

Equipped with the generalized model, we formulate a theory for the evolution of the density field ρ(x, t). Dean’s method, originally introduced for Brownian particles with additive noise^45^, and later extended to study systems with multiplicative noise^46–48^, is a well-known approach for directly coarse-graining microscopic dynamics. However, it has not yet been extended to systems where noise is both pairwise and correlated across components. Starting from Eq. (5), we extend Dean’s method^45^ and its subsequent generalizations^48^ to incorporate pairwise correlated noise, and derive the resulting fluctuating hydrodynamic equation for ρ(x, t) in arbitrary spatial dimension d to get (SI Sec. I.B.2),

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{\partial \rho ({{{\bf{x}}}},t)}{\partial t}=- \underbrace{{{\nabla }}\cdot [\rho ({{{\bf{x}}}}){{{\bf{v}}}}({{{\bf{x}}}})]}_{{{{\rm{d}}}}{{{\rm{r}}}}{{{\rm{i}}}}{{{\rm{f}}}}{{{\rm{t}}}}{\textstyle \,}{{{\rm{term}}}}}+\underbrace{{{\nabla }}{{{\nabla }}:[{{{\bf{D}}}}({{{\bf{x}}}})\rho ({{{\bf{x}}}})}]}_{{{{\rm{diffusion}}}}{\textstyle \,}{{{\rm{term}}}}}- \underbrace{{{\nabla }}\cdot {{{{\bf{j}}}}}_{n}({{{\bf{x}}}})}_{{{{\rm{noise}}}}{\textstyle \,}{{{\rm{term}}}}},$$\end{document}

where ∇ = ∇x, and : denotes a double dot product.

The velocity v(x) in Eq. (6) is given by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\bf{v}}}}({{{\bf{x}}}})=-\frac{1}{\gamma }{\langle \nabla V({{{\bf{x}}}},{{{\bf{y}}}})\rangle }_{\rho ({{{\bf{y}}}})},$$\end{document}

where 〈a〉ρ(y) = ∫ a**ρ(y)dy, and V(x, y) is the “continuous” version of Vj**i, given by replacing xi and xj by x and y in Eq. (4). v(x) originates from the deterministic (first) term in Eq. (5), and can be understood as the average force at point x due to the local interaction potential, − 〈∇V(x, y)〉ρ(y), divided by the friction coefficient γ.

The diffusion tensor D(x) in Eq. (6) is given by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\bf{D}}}}({{{\bf{x}}}})=\frac{1}{2}{\langle {{{\boldsymbol{\Lambda }}}}({{{\bf{x}}}},{{{\bf{y}}}})\rangle }_{\rho ({{{\bf{y}}}})}.$$\end{document}

D(x) originates from the noise (second) term in Eq. (5), and can be understood as the average diffusion tensor over the local density.

The stochastic flux jn(x) in Eq. (6) is given by

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\bf{j}}}}}_{n}({{{\bf{x}}}})=-\sqrt{\rho ({{{\bf{x}}}})}\int \sqrt{\rho ({{{\bf{y}}}})}\sqrt{{{{\boldsymbol{\Lambda }}}}({{{\bf{x}}}},{{{\bf{y}}}})}\cdot {{{\boldsymbol{\eta }}}}({{{\bf{x}}}},{{{\bf{y}}}})\,{{{\rm{d}}}}{{{\bf{y}}}},$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{{{{\boldsymbol{\Lambda }}}}}$$\end{document}$ denotes a matrix square root, and Λ(x, y) is the “continuous” version of Λj**i. η(x, y, t) is a vectorial, two-point, Gaussian noise field having mean 〈ηα(x, y, t)〉 = 0, and covariance matrix $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\langle {\eta }_{\alpha }({{{\bf{x}}}},{{{\bf{y}}}},t)\,{\eta }_{\beta }({{{\bf{u}}}},{{{\bf{w}}}},{t}^{{\prime} })\rangle={\delta }_{\alpha \beta }\delta (t-{t}^{{\prime} })\,[\delta ({{{\bf{x}}}}-{{{\bf{u}}}})\,\delta ({{{\bf{y}}}}-{{{\bf{w}}}})+\,c\,\delta ({{{\bf{x}}}}-{{{\bf{w}}}})\,\delta ({{{\bf{y}}}}-{{{\bf{u}}}})]$$\end{document}$ , where c ∈ [ − 1, 0] is the Pearson correlation coefficient between ηα(x, y, t) and ηα(y, x, t). jn(x) originates from the noise (second) term in Eq. (5). Notice that the noise at any spatial location x can be viewed as the sum of independent kicks from ny ∝ ρ(y)δ**V particles at y given to nx ∝ ρ(x)δ**V particles at x, where δ**V is an infinitesimal volume. Then, since the sum of n Gaussian noises yields a Gaussian noise with standard deviation $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\propto \sqrt{n}\propto \sqrt{\rho }$$\end{document}$ , we have $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\boldsymbol{j}}}}}_{n}\propto \sqrt{\rho ({{{\bf{x}}}})}\sqrt{\rho ({{{\bf{y}}}})}$$\end{document}$ . Further, similar to Eq. (5), $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{{{{\boldsymbol{\Lambda }}}}({{{\bf{x}}}},{{{\bf{y}}}})}$$\end{document}$ acts as a projection tensor, making the noise anisotropic. Finally, integrating over y collects contributions from different locations to the total stochastic flux at x.

We perform finite-difference simulations of Eq. (6) for RO, BRO, and SGD, and find that the long-range structure is quantitatively the same as particle simulations (Methods, Fig. 2b, c). Thus, our coarse-grained theory quantitatively captures the long-range structure for all random-organizing systems.

To gain further analytical insights on Eq. (6), we linearize ρ(x, t) to first order and derive the analytical form of the static structure factor (SI Sec. I.B.2),

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widetilde{S}(\widetilde{k})=(1+c)+[M(1+c)-c]{\widetilde{k}}^{2},$$\end{document}

where M is a known system-dependent constant (Eqs. S64, S66, and S68, SI Sec. I.B.2). Eq. (10) directly relates c, the microscopic noise correlation coefficient between a pair of particles, to the long-range structure in the system, and quantitatively predicts S(k) for all random-organizing systems without free parameters (Fig. 2b). Further, since $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{\delta \rho }}^{2}(\widetilde{l}\to \infty )\propto \widetilde{S}(\widetilde{k}\to 0)$$\end{document}$ ^7^, Eq. (10) also predicts the behavior of infinite wavelength density fluctuations for all random-organizing systems (Fig. 2c inset).

Equation (10) shows a competition between two terms. The first term (1 + c) makes the long-range structure random, while the second term $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$[M(1+c)-c]{\widetilde{k}}^{2}$$\end{document}$ makes the long-range structure strongly hyperuniform. The ratio of these two competing terms gives the normalized crossover length scale,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\widetilde{l}}_{c}=\sqrt{M-\frac{c}{1+c}},$$\end{document}

which quantitatively predicts the crossover length scale for all random-organizing systems (Fig. 2b inset).

Two remarks are in order. First, in a variety of other systems having local center of mass conserving dynamics^12,49–52^ (equivalent to c = − 1 in random-organizing systems) and displaying hyperuniformity in the dense phase, the coarse-grained noise term is of Laplacian form $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\nabla }^{2}[\sqrt{\rho ({{{\bf{x}}}})}\omega ({{{\bf{x}}}})]$$\end{document}$ , where ω(x) is a spatially uncorrelated, one-point Gaussian noise field. In contrast, the noise term in our microscopically derived theory (Eqs. (6), (9)) is fundamentally different from “Laplacian noise” in two ways: (i) it enters Eq. (6) in the “divergence” form and, (ii) is spatially correlated. Second, the widely used Fokker-Planck coarse-graining approach gives a density evolution equation identical to Eq. (6), but without the stochastic flux term (jn(x) = 0), which, after density linearization, trivially predicts spatially uniform steady-state density ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho ({{{\bf{x}}}})={{{\rm{constant}}}}$$\end{document}$ and S(k) = 0) independent of c (SI Sec. I.B.1). This demonstrates the crucial role of noise and noise correlations even at the coarse-grained scale in determining the long-range structure, thereby underscoring the importance of our theoretical framework.

Flatness of energy landscape in stochastic gradient descent

SGD is widely used for training neural networks, not only for its computational efficiency but also for its remarkable ability to steer neural networks toward flat regions of their loss landscapes^32,42,53^. This attribute is crucial for “good” learning algorithms since flatter regions are strongly correlated with better generalization performance on unseen data^30–33^. The bias towards flat regions, due to the selection noise in SGD, highlights the vital role of noise in shaping learning dynamics in neural networks. This, then, raises two questions: (i) Does SGD-driven descent of energy landscapes in particle systems similarly bias the dynamics toward flat regions, akin to neural networks? If so, (ii) Can this bias be linked to the long-range structure observed in these particle systems?

We first examine how the flatness of the explored regions of the energy landscape varies with noise correlation c, batch fraction bf, and learning rate α in SGD (Eq. (3)). We add a small noise N to the steady-state configurations X obtained in particle simulations to get a perturbed configuration X + ΔX and measure the change in the total energy of the system, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta E={\langle E({{{\bf{X}}}}+\Delta {{{\bf{X}}}})-E({{{\bf{X}}}})\rangle }_{{{{\bf{N}}}}}\propto {{{\rm{Tr}}}}({{{\bf{H}}}})$$\end{document}$ , where E = ∑i∑j>iVi**j, and H is the Hessian matrix of the system (Eq. (4), Fig. 3a, Methods)^23^. Δ**E is a measure of the flatness of the energy landscape: lower Δ**E corresponds to flatter regions (Fig. 3a)^23,30^. We find that Δ**E decreases as c decreases, meaning that increase in long-range structure leads to flatter regions (Figs. 3b, 2b). We now fix c = − 1 (anti-correlated noise), and find that Δ**E increases with bf^23^ and decreases with α, suggesting that lower batch fractions and higher learning rates lead to flatter regions of energy landscape—consistent with results on SGD dynamics in neural networks (Fig. 3c, d)^30–32^. Notice that c = − 1 and bf = 1 correspond to (noiseless) gradient descent (Fig. 3c). All results are qualitatively independent of particle volume fraction ϕ > ϕc, the potential Vi**j, and the spatial dimension d, a useful feature since typical neural manifold dimensions are $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\mathcal{O}}}}(100)$$\end{document}$ ^26^ (SI Fig. S5).Fig. 3. Flatness of energy landscape in stochastic gradient descent (SGD).a Zoomed-in exemplar configuration of discrete-time SGD before and after adding a small noise to all particles (Top). Schematic of a system (black ball) in a steady-state configuration. The system has energy E at the steady-state configuration X, and energy E + Δ**E at a slightly perturbed position X + ΔX after the addition of a small noise N. b Normalized energy change $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta \widetilde{E}(c)$$\end{document}$ versus noise correlation c. Δ**E(c) is normalized as: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta \widetilde{E}(c)=[\Delta E(c)-\Delta E(c=-1)]/[\Delta E(c=0)-\Delta E(c=-1)]$$\end{document}$ . Black line shows prediction of Eq. S79 (SI Sec. I.C). Schematic showing two regions of energy landscape with different flatness (left). c Normalized energy change $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta \widetilde{E}({b}_{f})$$\end{document}$ versus batch fraction bf. Δ**E(bf) is normalized as: $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta \widetilde{E}({b}_{f})=[\Delta E({b}_{f})-\Delta E({b}_{f}= 0.1)]/[\Delta E({b}_{f}=1.0)-\Delta E({b}_{f}=0.1)]$$\end{document}$ . Black line shows prediction of Eq. S82 (SI Sec. I.C). d Normalized energy change $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta \widetilde{E}(\alpha )$$\end{document}$ versus learning rate α. Δ**E(α) is normalized as: Δ**E(α) = [Δ**E(α) − Δ**E(α = 1.0)]/[Δ**E(α = 0.5) − Δ**E(α = 1.0)]. Black line shows prediction of Eq. S81 (SI Sec. I.C). Data in (b), (c), and (d) denote discrete-time particle simulations.

Finally, we investigate the relationship between Δ**E and the long-range structure formed by SGD. Combining the change in S(k) before and after adding a small perturbation to the system^54^ with our fluctuating hydrodynamic theory (Eq. (10)), we derive an expression for Δ**E(c, bf, α) (SI Sec. I.C, Eqs. S79, S81, and S82)—in quantitative agreement with numerical simulations without free parameters (Fig. 3b–d). Thus, the emergent long-range structure in SGD provides a quantitative framework for understanding the flatness of energy landscape, a key feature linked to generalization. This unites two seemingly disparate domains—neural networks and interacting-particle systems—by revealing that the bias of SGD towards flat regions is a universal hallmark of high-dimensional loss (or energy) landscapes, regardless of the underlying system. Beyond its relevance to machine learning algorithms, our framework may guide the design of self-organizing materials with tunable energetic and structural properties.

To understand the characteristics of the flatter regions of energy landscape explored by SGD, we also measure the energy E of the steady-state configurations X in particle simulations. The dependence of E on key parameters—noise correlation c, batch fraction bf, and learning rate α—is opposite to that of the energy change for small perturbations Δ**E (SI Figs. S5, and S6). This finding is in quantitative agreement with predictions from our fluctuating hydrodynamic theory (SI. Sec. I.C, Eqs. S72, S74, S75, and Fig. S6). Notably, the magnitude of these changes differs significantly: variation in system parameters leads to a substantial change in Δ**E ~ 50%, but only a minor change in E ~ 8%. Therefore, flatter regions of the energy landscape explored by SGD are associated with nearly constant or slightly higher energy. A similar trend is also observed in SGD dynamics on neural networks, where flatter regions of loss landscape often correlate with constant or slightly higher loss^30^.

Discussion

Random-organizing systems offer an ideal framework to probe noise-driven self-organization. Combining simulations and theory, we provide a unified, microscopic description of dynamics and emergent organization across a diverse array of random-organizing systems. We reveal that universal long-range behavior arises away from criticality in these microscopically distinct systems, dictated solely by interparticle noise correlations. Finally, we demonstrate that SGD in particle systems inherently biases dynamics toward flatter energy regions, mirroring the behavior of SGD in neural networks—highlighting deep parallels between SGD dynamics in these two high-dimensional, but otherwise completely distinct, systems.

Random-organizing systems are inherently athermal (Eqs. 1, 2, and 3). We investigate how their long-range structure changes at a finite temperature by incorporating thermal noise, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{2{D}^{{{{\rm{th}}}}}}\,{{{{\boldsymbol{\xi }}}}}_{i}^{{{{\rm{th}}}}}(t)$$\end{document}$ , into the generalized model (Eq. 5), where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{{\boldsymbol{\xi }}}}}_{i}^{{{{\rm{th}}}}}(t)$$\end{document}$ is a standard Gaussian white noise, D^th^ = kBT/γ, kB is the Boltzmann constant, and T is the temperature. We assume that the thermal noise and the intrinsic athermal noise, ξj**i(t), are uncorrelated (Eq. 5, and SI Eq. S83). We find, both analytically and in particle simulations, that the infinite wavelength density fluctuations, quantified by S(k → 0), increase with temperature, across all noise correlations c and systems (SI Sec. I.D, Eq. S85, Fig. S7). Notably, systems with anti-correlated noise (c = −1), which are hyperuniform when athermal, show S(k → 0) ∝ T, illustrating how hyperuniformity is affected by thermal fluctuations (SI Fig. S7 inset). This behavior, also observed in other non-equilibrium hyperuniform systems^49^, is similar to how thermal fluctuations weaken hyperuniformity in equilibrium crystals, as described by the fluctuation-compressibility relationship S(k = 0) = κρkBT, where κ is the isothermal compressibility^7,55^.

We now discuss the conditions under which the linearization approximation of our fluctuating hydrodynamic theory remains valid (Eqs. 6, and 10). It is well known that such approximations hold for high density and soft overlap potentials^56–59^. Indeed, our linearized theory does not explain the long-range structure either in the absorbing phase (ϕ < ϕc) or at criticality (ϕ = ϕc). In the absorbing phase, the long-range structure remains unchanged by the noise correlation c (SI Fig. S8). At criticality, the long-range structure is still independent of c but becomes strongly dependent on dimension d: for d < 4 the systems are hyperuniform (S(k → 0) ~ k^α^), whereas for d≥4 they are random ( $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S(k\to 0) \sim {{{\rm{const.}}}}$$\end{document}$ ), with the exponent α varying with d^7,17,18,21,23,60,61^.

From an application perspective, disordered hyperuniform systems possess features such as isotropic photonic bandgaps at low dielectric constants^62–64^, defect-insensitive bandgaps^65^, exceptional transparency^66^, and high absorption rates in solar cells^67^. Consequently, designing structures with tunable hyperuniformity is highly desirable. Since many-body, and long-range interactions are difficult to realize in experiments, random-organizing systems offer a promising framework for designing hyperuniform structures due to their two-body, and short-range interactions. Hyperuniformity at criticality in random-organizing systems, however, is weak (class III hyperuniformity^7^) and highly sensitive to density, with even slight density variations (~ 0.5%) capable of disrupting it^9,10,17,18,25^. In contrast, hyperuniformity in the active phase is strong (class I hyperuniformity^7^), and independent of density, providing a robust approach for experimental realization.

Finally, our study focuses on random-organizing systems with pairwise (two-body) interactions. How does the long-range structure change when many-body interactions are introduced in such systems? We hypothesize that imposing many-body interactions could give rise to finer control over the long-range structure, opening new avenues for designed self-assembly in complex materials.

Methods

Particle simulations

Our system consisted of N hyperspherical particles of radius R in a d-dimensional hypercubic box of side length L with periodic boundary conditions. The unit of length, time, and energy were chosen as 2R, δ**t, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\mathcal{E}}}}$$\end{document}$ , respectively. δ**t is the simulation time-step for continuous-time simulations and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\mathcal{E}}}}=1$$\end{document}$ is the characteristic energy scale of the potential given by Eq. (4). N = 318309 and R = 1 were kept fixed and the particle volume fraction ϕ = N**Vs/Vc was varied by changing L. Vs and Vc denote volumes of a d-dimensional hypersphere of radius R, and hypercube of side length L, respectively. Particles were randomly distributed in the simulation box at t = 0. All simulations were run until the system reached a steady-state and all measurements were performed at steady-state. All results were averaged over 100 steady-state configurations.

Discrete-time simulations

For discrete-time simulations of RO, BRO, and SGD, the dynamics of particles were evolved according to Eqs. (1), (2), and (3), respectively. As evident from Eqs. (1), (2), and (3), positions of isolated particles (particles with no overlapping neighbors) do not evolve at any given time-step.

Dynamics for RO are controlled by four parameters, the kick magnitude ϵ, the particle volume fraction ϕ, the spatial dimension d, and the correlation of pairwise noise c. For the results reported in the main text, parameters were set as ϵ = 1, ϕ = 1.0, d = 2, and c ∈ [−1, 0]. The critical volume fraction ϕc ≈ 0.375 for this set of parameters.

Dynamics for BRO are controlled by four parameters, the kick magnitude ϵ, the particle volume fraction ϕ, the spatial dimension d, and the correlation of pairwise noise c. For the results reported in the main text, parameters were set as ϵ = 1, ϕ = 1.0, d = 2, and c ∈ [− 1, 0]. The critical volume fraction ϕc ≈ 0.395 for this set of parameters.

Dynamics for SGD are controlled by six parameters, the learning rate α, the particle volume fraction ϕ, the batch fraction bf, the “stiffness” of the potential p, the spatial dimension d, and the correlation of pairwise noise c. For the results reported in the main text, parameters were set as α = 0.5, ϕ = 1.0, bf = 0.5, p = 1, d = 2, and c ∈ [− 1, 0]. The critical volume fraction ϕc ≈ 0.615 for this set of parameters.

For the measurement of flatness of energy landscape in SGD, a configuration X ≡ {x1, x2, . . . , xN} was perturbed by adding an independent Gaussian noise Ni(0, σ^2^I) to the position of each particle to get X + ΔX ≡ {x1 + N1, x2 + N2, . . . , xN + NN}. All results were averaged over 5000 independent noise realizations. For the results reported in Fig. 3b, parameters were set as α = 0.5, bf = 0.5, ϕ = 1.0, p = 1, d = 2, σ = 0.01, and c ∈ [− 1, 0]. For the results reported in Fig. 3c, parameters were set as α = 0.5, c = − 1, ϕ = 1.0, p = 1, d = 2, σ = 0.01, and bf ∈ [0.1, 1.0]. For the results reported in Fig. 3d, parameters were set as bf = 0.5, c = − 1, ϕ = 1.0, p = 1, d = 2, σ = 0.01, and α ∈ [0.5, 1.0].

Continuous-time simulations

For the continuous-time simulations of the generalized model of RO, BRO, and SGD, the dynamics of particles were evolved according to Eq. (5), supplemented with appropriate values of the friction constant γ, and the short-range interaction potential Vi**j and matrix Λi**j. The Euler-Maruyama method was used to solve Eq. (5), with a time-step δ**t = 1.0.

For the generalized model of RO, Vj**i = 0, and Λj**i,α**β = 1(0, 2R)(ri**j)(ϵ^2^/3d**τ)δα**β, where 1(0, 2R)(ri**j) is an indicator function such that 1(0, 2R)(ri**j) = 1 ∀ ri**j ∈ (0, 2R) and 1(0, 2R)(ri**j) = 0 otherwise. τ is the time scale quantifying the time elapsed in a discrete step. For the results reported in the main text, parameters were set as ϵ = 1, τ = 1.0, ϕ = 1.0, d = 2, and c ∈ [ − 1, 0].

For the generalized model of BRO, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma={{{\mathcal{E}}}}\tau /\epsilon R$$\end{document}$ , $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\Lambda }_{ji,\alpha \beta }= ({\epsilon }^{2}{R}^{2}/3\tau {{{{\mathcal{E}}}}}^{2})\,{\partial }_{\alpha }{V}_{ji}{\partial }_{\beta }{V}_{ji}$$\end{document}$ , and Vj**i is a short-range, pairwise, linear, repulsive potential (Eq. (4) with p = 1). For the results reported in the main text, parameters were set as ϵ = 1, τ = 1.0, ϕ = 1.0, d = 2, and c ∈ [ − 1, 0].

For the generalized model of SGD, γ = τ/α**bf, Λj**i,α**β = (α^2^bf(1 − bf)/τ) ∂αVj**i∂βVj**i, and Vj**i is given by Eq. (4). For the results reported in the main text, parameters were set as α = 1, bf = 0.5, τ = 1, p = 1, ϕ = 1.0, d = 2, and c ∈ [−1, 0].

Continuum simulations

Our system consisted of density ρ(x, t) evolving in a d-dimensional hypercubic box of side length L with periodic boundary conditions. The unit of length, time, and energy were chosen as R, δ**t, and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\mathcal{E}}}}$$\end{document}$ , respectively. δ**t is the simulation time-step, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{{\mathcal{E}}}}=1$$\end{document}$ is the characteristic energy scale of the continuous version of the potential given by Eq. (4), and 2R is the cutoff length of the potential. Space was discretized with a square grid of grid spacing δ**l = L/512. L = 200 and R = 1 were kept fixed and ∫ρ(x, t)dx = N was fixed at all times (density conservation). Density at all grid points was sampled from a Gaussian distribution with mean N/Vc, and standard deviation 0.01N/Vc at t = 0. All other parameters for RO, BRO, and SGD were chosen to be the same as that of continuous-time particle simulations. All simulations were run until the system reached a steady-state and all measurements were performed at steady-state.

The finite-difference method combined with forward Euler time stepping was used to solve Eq. (6). Spatial derivatives for the stochastic flux and the diffusion term in Eq. (6) were approximated by the second-order central difference scheme. We split the velocity term (Eqs. (6), (7)) into two parts using chain rule,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{\gamma }\nabla \cdot \left[\rho ({{{\bf{x}}}})\nabla {\langle V({{{\bf{x}}}},{{{\bf{y}}}})\rangle }_{\rho ({{{\bf{y}}}})}\right]=\frac{1}{\gamma }\nabla \rho ({{{\bf{x}}}})\cdot \nabla {\langle V({{{\bf{x}}}},{{{\bf{y}}}})\rangle }_{\rho ({{{\bf{y}}}})} \\+\frac{1}{\gamma }\rho ({{{\bf{x}}}})\,{\nabla }^{2}{\langle V({{{\bf{x}}}},{{{\bf{y}}}})\rangle }_{\rho ({{{\bf{y}}}})}.$$\end{document}

Spatial derivatives in the first and second term on the right hand side of Eq. (12) were approximated using the first-order forward difference, and the second-order central difference scheme, respectively. The finite-difference schemes were chosen to ensure strict density conservation and prevent checkerboard artifacts.

Structure factor and density fluctuations

The structure factor for a particle system is defined as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S({{{\bf{k}}}})=| \widehat{\rho }({{{\bf{k}}}}){| }^{2}/N$$\end{document}$ , where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho ({{{\bf{x}}}})={\sum }_{i=1}^{N}\delta ({{{\bf{x}}}}-{{{{\bf{x}}}}}_{i})$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{f}$$\end{document}$ is the spatial Fourier transform of any arbitrary function f(x), given by $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\widehat{f}({{{\bf{k}}}})=\int \,f({{{\bf{x}}}}){e}^{-i{{{\bf{k}}}}\cdot {{{\bf{x}}}}}d{{{\bf{x}}}}$$\end{document}$ . S(k) was calculated using the nonuniform fast Fourier transform^68,69^. The radial structure factor S(k) was then calculated by radially averaging S(k).

The structure factor for a continuous density field ρ(x) is defined as $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S({{{\bf{k}}}})=| \widehat{\delta \rho }({{{\bf{k}}}}){| }^{2}/\bar{\rho }$$\end{document}$ , where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta \rho ({{{\bf{x}}}})=\rho ({{{\bf{x}}}})-\bar{\rho }$$\end{document}$ , and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\rho }=\int \rho ({{{\bf{x}}}})d{{{\bf{x}}}}/\int \,d{{{\bf{x}}}}$$\end{document}$ .

Number density variance δ**ρ^2^(l) in a hyperspherical window of diameter l was measured by exploiting the exact relationship between S(k) and δ**ρ^2^(l)^7^,

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta {\rho }^{2}(l)=\frac{\bar{\rho }d{2}^{d}\Gamma (1+\frac{d}{2})}{{(\sqrt{\pi }l)}^{d}}{\int }_{0}^{\infty }\frac{1}{k}S(k){\left[{J}_{d/2}\left(\frac{kl}{2}\right)\right]}^{2}\,dk,$$\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\rho }=N/{V}_{c}$$\end{document}$ , Γ is the Gamma function, and J is the Bessel function of the first kind. The integral in Eq. (13) was evaluated numerically using Simpson’s rule, based on the measured S(k).

Supplementary information

Supplementary Information Transparent Peer Review file

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Lei, Y., Zheng, N. & Ni, R. Random organization and non-equilibrium hyperuniform fluids on a sphere. J. Chem. Phys.159, 081101 (2023).10.1063/5.016552737606337 · doi ↗ · pubmed ↗
2Zhang, G. & Martiniani, S. Absorbing state dynamics of stochastic gradient descent. ar Xiv preprint ar Xiv:2411.11834 (2024).
3Zhang, G., Heeger, D. J. & Martiniani, S. Contrastive self-supervised learning as neural manifold packing. The Thirty-ninth Annual Conference on Neural Information Processing Systems, https://openreview.net/forum?id=e Tg Xolh WCH (2025).
4Henkel, M.Non-equilibrium phase transitions (Springer, 2008).
5Jastrzębski, S. et al. Three factors influencing minima in sgd. ar Xiv preprint ar Xiv:1711.04623 (2017).
6Xie, Z., Sato, I. & Sugiyama, M. A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. In International Conference on Learning Representations, https://openreview.net/forum?id=w Xgk_i Ci Y Go (2021).
7Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M. & Tang, P. T. P. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, https://openreview.net/forum?id=H 1oy Rl Ygg (2017).
8Huang, W. R. et al. Understanding generalization through visualizations. In Proceedings of the 37th International Conference on Machine Learning, 119, 4499–4509 (PMLR, 2020).