Posterior Approximation using Stochastic Gradient Ascent with Adaptive Stepsize
Kart-Leong Lim, Xudong Jiang

TL;DR
This paper introduces an adaptive stepsize stochastic gradient ascent method for efficient posterior approximation in Bayesian nonparametrics, demonstrating comparable performance to traditional methods on large-scale datasets.
Contribution
It develops an adaptive stepsize stochastic gradient ascent algorithm for posterior approximation, integrating Fisher information for improved speed and scalability.
Findings
Achieves comparable accuracy to coordinate ascent methods.
Scales effectively to large datasets like Caltech256 and SUN397.
Compatible with deep convolutional neural network features.
Abstract
Scalable algorithms of posterior approximation allow Bayesian nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. Recent algorithms, notably the stochastic variational inference performs local learning from minibatch. The main problem with stochastic variational inference is that it relies on closed form solution. Stochastic gradient ascent is a modern approach to machine learning and is widely deployed in the training of deep neural networks. In this work, we explore using stochastic gradient ascent as a fast algorithm for the posterior approximation of Dirichlet process mixture. However, stochastic gradient ascent alone is not optimal for learning. In order to achieve both speed and performance, we turn our focus to stepsize optimization in stochastic gradient ascent. As as intermediate approach, we first optimize stepsize using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning
MethodsAverage Pooling · Global Average Pooling · Kaiming Initialization · Adam · Variational Inference · Convolution · Max Pooling
