# Controlling Model Complexity in Probabilistic Model-Based Dynamic   Optimization of Neural Network Structures

**Authors:** Shota Saito, Shinichi Shirakawa

arXiv: 1907.06341 · 2022-05-27

## TL;DR

This paper introduces a penalty-based approach to control the complexity of neural network structures during probabilistic model-based optimization, reducing overfitting and computational costs while maintaining performance.

## Contribution

It proposes a novel penalty term with an analytical natural gradient to effectively regulate model complexity in structure optimization.

## Key findings

- Successfully controls model complexity in neural networks
- Maintains performance while reducing unnecessary structure complexity
- Applicable to both fully-connected and convolutional neural networks

## Abstract

A method of simultaneously optimizing both the structure of neural networks and the connection weights in a single training loop can reduce the enormous computational cost of neural architecture search. We focus on the probabilistic model-based dynamic neural network structure optimization that considers the probability distribution of structure parameters and simultaneously optimizes both the distribution parameters and connection weights based on gradient methods. Since the existing algorithm searches for the structures that only minimize the training loss, this method might find overly complicated structures. In this paper, we propose the introduction of a penalty term to control the model complexity of obtained structures. We formulate a penalty term using the number of weights or units and derive its analytical natural gradient. The proposed method minimizes the objective function injected the penalty term based on the stochastic gradient descent. We apply the proposed method in the unit selection of a fully-connected neural network and the connection selection of a convolutional neural network. The experimental results show that the proposed method can control model complexity while maintaining performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.06341/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1907.06341/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1907.06341/full.md

---
Source: https://tomesphere.com/paper/1907.06341