Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
Sayan Banerjee, Krishnakumar Balasubramanian, and Promit Ghosal

TL;DR
This paper establishes improved finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in terms of Kernelized Stein Discrepancy and Wasserstein-2 metrics, demonstrating near optimal rates and polynomial dimension dependence.
Contribution
The paper provides the first finite-particle convergence rates for SVGD in KSD and Wasserstein metrics, with a novel analysis splitting the entropy derivative into dominant and smaller parts, leading to near optimal rates.
Findings
KSD convergence rate of order 1/√N in continuous and discrete time.
Wasserstein-2 convergence under bilinear kernel modifications.
Polynomial growth of bounds in the dimension d.
Abstract
We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy () and Wasserstein-2 metrics. Our key insight is that the time derivative of the relative entropy between the joint density of particle locations and the -fold product target measure, starting from a regular initial distribution, splits into a dominant `negative part' proportional to times the expected and a smaller `positive part'. This observation leads to rates of order , in both continuous and discrete time, providing a near optimal (in the sense of matching the corresponding i.i.d. rates) double exponential improvement over the recent result by Shi and Mackey (2024). Under mild assumptions on the kernel and potential, these bounds also grow polynomially in the dimension .…
Peer Reviews
Decision·ICLR 2025 Oral
This paper establishes quantitative convergence guarantees for finite-particle (and discrete-time) SVGD, with a much better dependency on problem parameters than the only previous known analysis. This is thus arguably the first satisfactory convergence bound, for an algorithm that has attracted considerable attention from theoreticians. So this paper's achievement is highly significant. The main novel insight used in this paper is remarkably clean. It is also quite satisfying, in that it allows
Section 5 on propagation of chaos (POC) could benefit from a little bit more motivation: it is not clear why POC would be a desirable property for SVGD. Possible typos: - In Assumption 1(b) and Lemma 1, p_0^N needs to be C2 (not C1) for p^N(t,.) to be C2 - in Assumption 2(d), maybe = is meant to be <= - typos on line 430
The SVGD algorithm was well studied in practice for several years already. However, its theoretical understanding was rather limited. As well discussed by the authors, past work only tackled particular (simpler) cases discretization mechanisms of the continuous SVGD, leaving the most interesting case open. That is the finite-particle discrete-time setting. In this setting they solve this problem and obtain polynomial convergence guarantees for different types of metrics, under certain conditions
### General comments - Some of the conditions in the main results are not easy to verify. See the Questions section. - The paper is generally well-written, although certain mathematical details are omitted. ### Mathematical comments - The derivations starting from *line 281* are not well-explained. Why does the first term on the right-hand side vanish on second line? How is the third line derived? - In order to obtain (17), the term $(\sum_j V(x_j)/N)^{2\alpha}$ is upper bounded by $\sum_
* In KSD, this paper significantly improves the particle dependence over Shi & Mackey (2024) (which is known to be the previous best result) by considering the evolution of the joint distribution of N particles. * In Wassertein-2 distance, this is the first convergence rate, albeit showing curse of dimension. * The discussion on related works is extensive and informative.
* Some notations should be further clarified, and there is not a subsection collecting all the notations which makes certain parts not that readable. * This paper is short of some discussion on assumptions. * Comparison with previous analysis including assumptions and rates is not that clear. * The discussion on the convergence in Wasserstein-2 distance is weak. For more details, see Questions below.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParticle Dynamics in Fluid Flows
