Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
Guillaume Braun, Bruno Loureiro, Ha Quang Minh, Masaaki Imaizumi

TL;DR
This paper analyzes the learning dynamics of phase retrieval with power-law spectral data, revealing a three-phase process and deriving explicit scaling laws for convergence and error, supported by experiments.
Contribution
It provides the first rigorous characterization of scaling laws in nonlinear regression with anisotropic data, uncovering how spectral decay influences learning dynamics.
Findings
Identifies a three-phase learning trajectory in anisotropic phase retrieval.
Derives explicit scaling laws linking spectral decay to convergence times.
Experimental validation confirms theoretical predictions of phases and exponents.
Abstract
Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic Gaussian inputs whose covariance spectrum follows a power law. Unlike the isotropic case, where dynamics collapse to a two-dimensional system, anisotropy yields a qualitatively new regime in which an infinite hierarchy of coupled equations governs the evolution of the summary statistics. We develop a tractable reduction that reveals a three-phase trajectory: (i) fast escape from low alignment, (ii) slow convergence of the summary statistics, and (iii) spectral-tail learning in low-variance directions. From this decomposition, we derive explicit scaling laws for the mean-squared error, showing how spectral decay dictates convergence times and error…
Peer Reviews
Decision·ICLR 2026 Oral
1. This paper gives a solid theoretical study of scaling laws of phase retrieval problems. 2. The authors reveal that the problem can be viewed as infinite-dimensional dynamical systems, and can be analyzed by some math tools such as ODE, dynamical systems, etc.
1. It is unclear how the dynamics of gradient flow of the phase retrieval problems can inspire the design of new algorithms for solving real-world, large-scale problems. Although this paper mainly focuses on theory, it might be beneficial to include some discussions on the connections between this toy problem and real-world ones. 2. Most of the results include the big-O notations, meaning that the conclusions can only hold when certain numbers are sufficiently large/small. This might indicate t
I think the key originality this paper brings is the analysis under anisotropy. I also think on the whole that the paper is well written and clearly explained. In terms of significance I think the identification of slow convergence with the tail of the distribution of the data is nice, particularly around the emergence of this third phase not present when $a=0$. I was also think the fact that a larger $a$ implies faster decay of the loss in the first phase but slower convergence in the third pha
My only critique perhaps concerns the idealized setting, namely a quadratically activated neuron trained under full batch gradient flow on Gaussian data with power law decay. It is not clear to me how much of the narrative here applies elsewhere to other non-linear optimization problem settings of interest. As a result, I think the scope and general applicability of the results are not clear.
This paper studies an important issue in learning theory. It considers a nontrivial, currently poorly understood, setting combining model nonlinearity and data anisotropy - both of which are expected to affect the detailed behavior of neural scaling laws. The technical analysis of the paper is interesting and rigorous, and is likely to be applicable in future work. The main theoretical conclusions of the paper are intuitive and appealing, and are supported by limited experimental verification/ex
**Robustness of conclusions and applicability to more general settings**. The paper obtains satisfying conclusions about the qualitative behavior of GD dynamics in phase retrieval. However, the neat 3-phase picture appears to depend somewhat on the loss structure of the specific problem under consideration. Currently, the paper does not appear to give any reason - even hand waving - to suspect that more complex models will exhibit escape, convergence, and/or tail learning phases resembling those
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced X-ray Imaging Techniques · Advanced Electron Microscopy Techniques and Applications · Digital Holography and Microscopy
