Learning RBM with a DC programming Approach

Vidyadhar Upadhya; P. S. Sastry

arXiv:1709.07149·cs.LG·October 6, 2017·2 cites

Learning RBM with a DC programming Approach

Vidyadhar Upadhya, P. S. Sastry

PDF

Open Access

TL;DR

This paper introduces a novel stochastic DC programming approach for learning RBMs, which outperforms traditional contrastive divergence in efficiency and effectiveness, especially when using centered gradients.

Contribution

It reformulates RBM training as a difference of convex functions problem, unifies it with contrastive divergence, and improves training efficiency with centered gradients.

Findings

01

Proposed algorithm reaches higher log-likelihood faster.

02

Modified algorithm with centered gradients is more efficient.

03

Outperforms standard contrastive divergence on benchmark datasets.

Abstract

By exploiting the property that the RBM log-likelihood function is the difference of convex functions, we formulate a stochastic variant of the difference of convex functions (DC) programming to minimize the negative log-likelihood. Interestingly, the traditional contrastive divergence algorithm is a special case of the above formulation and the hyperparameters of the two algorithms can be chosen such that the amount of computation per mini-batch is identical. We show that for a given computational budget the proposed algorithm almost always reaches a higher log-likelihood more rapidly, compared to the standard contrastive divergence algorithm. Further, we modify this algorithm to use the centered gradients and show that it is more efficient and effective compared to the standard centered gradient algorithm on benchmark datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning