Continuous Specialization Transition in the Soft Committee Machine with ReLU Activation
Assem Afanah, Bernd Rosenow

TL;DR
This paper analyzes the phase transition in the soft committee machine with ReLU activation, revealing a continuous specialization transition that depends on the activation function, contrasting with sigmoidal models.
Contribution
It provides an analytic study of the specialization transition in ReLU-based soft committee machines using the replica method, highlighting the role of activation functions in phase behavior.
Findings
ReLU soft committee machine exhibits a continuous transition.
Derived analytic expressions for critical training-set size.
Activation function influences phase transition nature.
Abstract
We analyze the soft committee machine with Rectified Linear Unit (ReLU) activation by means of the replica method. In a realizable teacher--student setting, we compute the quenched free energy within a replica-symmetric ansatz and obtain the typical generalization behavior from the saddle-point equations for the macroscopic order parameters. The system exhibits a transition from an unspecialized symmetric phase to a specialized phase in which the permutation symmetry among hidden units is broken. We determine the critical training-set size as a function of the inverse training temperature and derive analytic expressions both near the transition and in the asymptotic large-sample regime. Unlike the corresponding model with sigmoidal activations, which undergoes a first-order transition, the ReLU soft committee machine shows a continuous specialization transition. These results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Neural Networks and Applications · Theoretical and Computational Physics
