# On the Stability and Generalization of Learning with Kernel Activation   Functions

**Authors:** Michele Cirillo, Simone Scardapane, Steven Van Vaerenbergh, Aurelio, Uncini

arXiv: 1903.11990 · 2019-03-29

## TL;DR

This paper analyzes the generalization properties of kernel activation functions (KAFs) in neural networks, providing theoretical guarantees for their stability and offering practical guidelines for hyper-parameter selection, supported by experimental validation.

## Contribution

It offers the first theoretical proof of generalization for neural networks with KAFs and introduces a method to select the kernel bandwidth hyper-parameter.

## Key findings

- KAFs improve model flexibility and performance.
- Neural networks with KAFs generalize well when trained with SGD.
- Guidelines for choosing the Gaussian kernel bandwidth are provided.

## Abstract

In this brief we investigate the generalization properties of a recently-proposed class of non-parametric activation functions, the kernel activation functions (KAFs). KAFs introduce additional parameters in the learning process in order to adapt nonlinearities individually on a per-neuron basis, exploiting a cheap kernel expansion of every activation value. While this increase in flexibility has been shown to provide significant improvements in practice, a theoretical proof for its generalization capability has not been addressed yet in the literature. Here, we leverage recent literature on the stability properties of non-convex models trained via stochastic gradient descent (SGD). By indirectly proving two key smoothness properties of the models under consideration, we prove that neural networks endowed with KAFs generalize well when trained with SGD for a finite number of steps. Interestingly, our analysis provides a guideline for selecting one of the hyper-parameters of the model, the bandwidth of the scalar Gaussian kernel. A short experimental evaluation validates the proof.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.11990/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1903.11990/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1903.11990/full.md

---
Source: https://tomesphere.com/paper/1903.11990