Trainable Compound Activation Functions for Machine Learning
Paul M. Baggenstoss

TL;DR
This paper introduces trainable compound activation functions (TCA) that enhance neural network performance by combining simple AFs, enabling more effective modeling with fewer parameters and improving generative models like RBMs and VAEs.
Contribution
The paper proposes a novel trainable compound activation function that improves neural network efficiency and generative modeling by combining simple AFs into a learnable mixture.
Findings
TCAs improve network performance with fewer parameters.
TCAs enable effective marginal distribution estimation in generative models.
Enhanced results observed in RBMs, DBNs, PBNs, and VAEs.
Abstract
Activation functions (AF) are necessary components of neural networks that allow approximation of functions, but AFs in current use are usually simple monotonically increasing functions. In this paper, we propose trainable compound AF (TCA) composed of a sum of shifted and scaled simple AFs. TCAs increase the effectiveness of networks with fewer parameters compared to added layers. TCAs have a special interpretation in generative networks because they effectively estimate the marginal distributions of each dimension of the data using a mixture distribution, reducing modality and making linear dimension reduction more effective. When used in restricted Boltzmann machines (RBMs), they result in a novel type of RBM with mixture-based stochastic units. Improved performance is demonstrated in experiments using RBMs, deep belief networks (DBN), projected belief networks (PBN), and variational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Neural Networks and Applications · Model Reduction and Neural Networks
