Bound by semanticity: universal laws governing the generalization-identification tradeoff

Marco Nurisso; Jesseba Fernando; Raj Deshpande; Alan Perotti; Raja Marjieh; Steven M. Frankland; Richard L. Lewis; Taylor W. Webb; Declan Campbell; Francesco Vaccarino; Jonathan D. Cohen; Giovanni Petri

arXiv:2506.14797·cs.LG·June 19, 2025

Bound by semanticity: universal laws governing the generalization-identification tradeoff

Marco Nurisso, Jesseba Fernando, Raj Deshpande, Alan Perotti, Raja Marjieh, Steven M. Frankland, Richard L. Lewis, Taylor W. Webb, Declan Campbell, Francesco Vaccarino, Jonathan D. Cohen, Giovanni Petri

PDF

Open Access 3 Reviews

TL;DR

This paper establishes a fundamental limit on the tradeoff between generalization and identification in models, showing that finite semantic resolution constrains their capacity and that these laws apply across simple to complex neural systems.

Contribution

It derives universal formulas describing the generalization-identification tradeoff, validated across various neural network architectures and settings.

Findings

01

Models with finite semantic resolution follow a universal Pareto front.

02

A sharp 1/n collapse in processing capacity occurs with multiple inputs.

03

Empirical trajectories closely match theoretical predictions during learning.

Abstract

Intelligent systems must deploy internal representations that are simultaneously structured -- to support broad generalization -- and selective -- to preserve input identity. We expose a fundamental limit on this tradeoff. For any model whose representational similarity between inputs decays with finite semantic resolution $ε$ , we derive closed-form expressions that pin its probability of correct generalization $p_{S}$ and identification $p_{I}$ to a universal Pareto front independent of input space geometry. Extending the analysis to noisy, heterogeneous spaces and to $n > 2$ inputs predicts a sharp $1/ n$ collapse of multi-input processing capacity and a non-monotonic optimum for $p_{S}$ . A minimal ReLU network trained end-to-end reproduces these laws: during learning a resolution boundary self-organizes and empirical $(p_{S}, p_{I})$ trajectories closely follow theoretical curves for…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

This manuscript provides a well articulated contribution to the field, formalizing a tradeoff that had previously been empirically observed. The theory is well-presented and I think the ReLU experiments in particular provide helpful support for the existence of the noted tradeoff. I appreciated the detailed contextualization of this present work in the field. Finally, the paper is generally well-written and the figures are well-designed. Below are some additional parts of the paper I particularl

Weaknesses

As noted above, I liked this paper. My primary concern is that the experiments in section 5 provide rather limited evidence of this tradeoff. Your suggestion (and the suggestion of the prior literature) that this is a universal tradeoff would suggest that it should be apparent even in models that weren't trained explicitly on the identification and similarity task you're measuring. The fact that models become better at identification/similarity as you're varying the parameter prioritizing one or

Reviewer 02Rating 6Confidence 3

Strengths

Comprehensive background section with helpful literature review. The authors take the time to carefully explain their setup, accompanying notation with helpful illustrative examples. Authors tested both toy (allowing for theoretical analysis) and realistic models (allowing for confirmation at scale). Text is well-written, figures are clear, elegant, and helpful. Code is provided showing how to reproduce the results and figures used the paper.

Weaknesses

1. As mentioned by the authors in Limitations subsection, compositional representations and hierarchy were outside the scope of the study. Adding a small pilot on a compositional task to show if/why the current theory breaks would strengthen the paper. __Minor points__: 2. The concept of “semantic resolution” feels somewhat over-introduced. Mathematically, it appears equivalent to a kernel scale or bandwidth that simply controls how similarity decays with distance. While the term is evocative

Reviewer 03Rating 8Confidence 3

Strengths

1. The central tradeoff (Thm 1) is formally derived under clear assumptions about similarity decay and finite resolution. The analytical Pareto frontier is a nice contribution: mathematically precise, easy to reason about, and interpretable in terms of task accuracy. 2. The empirical sections demonstrate that real-world models qualitatively follow the predicted tradeoff curves 3. this work links the G-I tradeoff to well-known cognitive constraints like binding failures, generalization gradien

Weaknesses

1. The core tradeoff is derived assuming specific forms of similarity decay and decision rules. Would the authors elaborate how universal or sensitive their conclusion is to the choice of decay function? In Discussion, the authors refer to “finite-resolution similarity” as a universal constraint, suggesting that the existence of the tradeoff is robust, even if the exact shape of the curve depends on the similarity decay. It is unclear whether in those larger models, the similarity function follo

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques