# Dropout Regularization in Hierarchical Mixture of Experts

**Authors:** Ozan \.Irsoy, Ethem Alpayd{\i}n

arXiv: 1812.10158 · 2018-12-27

## TL;DR

This paper introduces a dropout variant tailored for hierarchical mixture of experts models, enhancing their generalization by respecting the hierarchical structure during regularization.

## Contribution

The authors propose a hierarchical dropout method that maintains the tree structure, improving over flat dropout approaches in mixture of experts models.

## Key findings

- Reduces overfitting on synthetic and real datasets
- Improves generalization in deep hierarchical models
- Provides smoother fits in complex models

## Abstract

Dropout is a very effective method in preventing overfitting and has become the go-to regularizer for multi-layer neural networks in recent years. Hierarchical mixture of experts is a hierarchically gated model that defines a soft decision tree where leaves correspond to experts and decision nodes correspond to gating models that softly choose between its children, and as such, the model defines a soft hierarchical partitioning of the input space. In this work, we propose a variant of dropout for hierarchical mixture of experts that is faithful to the tree hierarchy defined by the model, as opposed to having a flat, unitwise independent application of dropout as one has with multi-layer perceptrons. We show that on a synthetic regression data and on MNIST and CIFAR-10 datasets, our proposed dropout mechanism prevents overfitting on trees with many levels improving generalization and providing smoother fits.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.10158/full.md

## Figures

24 figures with captions in the complete paper: https://tomesphere.com/paper/1812.10158/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1812.10158/full.md

---
Source: https://tomesphere.com/paper/1812.10158