On the failure of ReLU activation for physics-informed machine learning
Conor Rowan

TL;DR
This paper investigates why ReLU activation functions perform poorly in physics-informed neural networks, revealing that automatic differentiation issues with ReLU's discontinuities lead to gradient mis-specification and failure.
Contribution
The study diagnoses the root cause of ReLU's failure in physics-informed machine learning, linking it to automatic differentiation limitations with discontinuous activation functions.
Findings
ReLU fails even on first-derivative variational problems.
Automatic differentiation struggles with ReLU's discontinuities.
ReLU's poor performance is due to gradient mis-specification during training.
Abstract
Physics-informed machine learning uses governing ordinary and/or partial differential equations to train neural networks to represent the solution field. Like any machine learning problem, the choice of activation function influences the characteristics and performance of the solution obtained from physics-informed training. Several studies have compared common activation functions on benchmark differential equations, and have unanimously found that the rectified linear unit (ReLU) is outperformed by competitors such as the sigmoid, hyperbolic tangent, and swish activation functions. In this work, we diagnose the poor performance of ReLU on physics-informed machine learning problems. While it is well-known that the piecewise linear form of ReLU prevents it from being used on second-order differential equations, we show that ReLU fails even on variational problems involving only first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Machine Learning in Materials Science · Neural Networks and Reservoir Computing
