Topology and Geometry of the Learning Space of ReLU Networks: Connectivity and Singularities

Marco Nurisso; Pierrick Leroy; Giovanni Petri; Francesco Vaccarino

arXiv:2602.00693·cs.LG·February 3, 2026

Topology and Geometry of the Learning Space of ReLU Networks: Connectivity and Singularities

Marco Nurisso, Pierrick Leroy, Giovanni Petri, Francesco Vaccarino

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the structure of the parameter space in ReLU networks, revealing how topology and singularities influence training dynamics, with implications for network design and pruning strategies.

Contribution

It provides a comprehensive characterization of connectedness and singularities in the parameter space of DAG-structured ReLU networks, extending previous theoretical results.

Findings

01

Parameter space forms an algebraic variety after initialization.

02

Connectivity depends on bottleneck nodes and balance conditions.

03

Singularities relate to the network's DAG topology and sub-networks.

Abstract

Understanding the properties of the parameter space in feed-forward ReLU networks is critical for effectively analyzing and guiding training dynamics. After initialization, training under gradient flow decisively restricts the parameter space to an algebraic variety that emerges from the homogeneous nature of the ReLU activation function. In this study, we examine two key challenges associated with feed-forward ReLU networks built on general directed acyclic graph (DAG) architectures: the (dis)connectedness of the parameter space and the existence of singularities within it. We extend previous results by providing a thorough characterization of connectedness, highlighting the roles of bottleneck nodes and balance conditions associated with specific subsets of the network. Our findings clearly demonstrate that singularities are intricately connected to the topology of the underlying DAG…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

The paper is very well written and provides some insights on properties of the training dynamics.

Weaknesses

To me the results seem to be relatively minor and easy extensions of previous results. The authors suggest that formulating these conservation laws with the use of the incidence matrix of the DAG gives significant new insight. But as far as I can see, the main insight is that there a singularities when parts of the graph become disconnected, which does not seem to be surprising.

Reviewer 02Rating 6Confidence 3

Strengths

Provides a sound and thorough theoretical analysis of the connectivity of learning space for ReLU-activated DAGs Networks trained under GF after arbitrary initialization. Theoretical analysis of the conditions of existence of singularities, and of the possibility to reach them, complemented with experimental results on tools to reach these singularities in practice.

Weaknesses

The results on connectivity might be achievable with simpler tools and less technicality. The experimental part on connectivity does not bring anything to the discussion. The introduction of some notions and symbols is lacking.

Reviewer 03Rating 6Confidence 4

Strengths

- The paper is relatively well-written and polished. Illustrative figures are provided to accompany the theoretical results and aid understanding. - The theoretical formulation is clean. - The result on the disconnectedness of the parameter space is somewhat surprising. The implication of losing expressivity at initialization seems interesting. - Some numerical experiments are conducted to validate theoretical results.

Weaknesses

- I am wondering whether the disconnected case occurs in fully-connected ReLU networks or not, since the example network given in Figure 2(d.1) does not look like a fully-connected network. If the disconnection only occurs in networks that are not fully connected, then the statement in line 358 may be inaccurate: "the expressivity can be reduced to the extent that they lose their universal approximation capability"; because ReLU networks that are not fully connected are not universal approximato

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Stochastic Gradient Optimization Techniques · Model Reduction and Neural Networks