Topological derivative approach for deep neural network architecture adaptation
C G Krishnanunni, Tan Bui-Thanh, Clint Dawson

TL;DR
This paper introduces a mathematically principled topological derivative method for adaptively adding layers to neural networks during training, optimizing architecture based on sensitivity analysis.
Contribution
It develops a novel topological derivative framework for neural network architecture adaptation, connecting topology optimization with optimal control and eigenvalue problems.
Findings
Outperforms baseline and other adaptation strategies on various models.
Provides a new method for layer insertion based on topological sensitivity.
Demonstrates applications in transfer learning.
Abstract
This work presents a novel algorithm for progressively adapting neural network architecture along the depth. In particular, we attempt to address the following questions in a mathematically principled way: i) Where to add a new capacity (layer) during the training process? ii) How to initialize the new capacity? At the heart of our approach are two key ingredients: i) the introduction of a ``shape functional" to be minimized, which depends on neural network topology, and ii) the introduction of a topological derivative of the shape functional with respect to the neural network topology. Using an optimal control viewpoint, we show that the network topological derivative exists under certain conditions, and its closed-form expression is derived. In particular, we explore, for the first time, the connection between the topological derivative from a topology optimization framework with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsAttention Is All You Need · Softmax · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Multi-Head Attention · Vision Transformer
