The Fundamental Incompatibility of Hamiltonian Monte Carlo and Data Subsampling
M. J. Betancourt

TL;DR
Hamiltonian Monte Carlo is highly efficient for complex distributions, but data subsampling undermines its exploration capabilities, making it unsuitable for scalable, data-intensive applications.
Contribution
This paper proves that data subsampling inherently conflicts with Hamiltonian Monte Carlo's exploration efficiency, challenging its scalability in large data settings.
Findings
Data subsampling compromises Hamiltonian flow exploration.
Hamiltonian Monte Carlo's efficiency is incompatible with data subsampling.
Subsampling prevents scalable application of Hamiltonian Monte Carlo.
Abstract
Leveraging the coherent exploration of Hamiltonian flow, Hamiltonian Monte Carlo produces computationally efficient Monte Carlo estimators, even with respect to complex and high-dimensional target distributions. When confronted with data-intensive applications, however, the algorithm may be too expensive to implement, leaving us to consider the utility of approximations such as data subsampling. In this paper I demonstrate how data subsampling fundamentally compromises the efficient exploration of Hamiltonian flow and hence the scalable performance of Hamiltonian Monte Carlo itself.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Statistical Methods and Inference · Stochastic processes and statistical mechanics
