Sequential Group Composition: A Window into the Mechanics of Deep Learning

Giovanni Luca Marchetti; Daniel Kunin; Adele Myers; Francisco Acosta; Nina Miolane

arXiv:2602.03655·cs.LG·February 4, 2026

Sequential Group Composition: A Window into the Mechanics of Deep Learning

Giovanni Luca Marchetti, Daniel Kunin, Adele Myers, Francisco Acosta, Nina Miolane

PDF

Open Access

TL;DR

This paper introduces the sequential group composition task to analyze how neural networks learn structured operations over sequences, revealing the roles of architecture depth, group structure, and encoding statistics in learning efficiency.

Contribution

It provides a theoretical analysis of neural network learning dynamics on the sequential group composition task, highlighting how depth and group properties influence learning complexity.

Findings

01

Two-layer networks learn one irreducible representation at a time.

02

Learning requires hidden width exponential in sequence length for shallow networks.

03

Deeper models exploit associativity to improve scaling, with recurrent and multilayer networks performing better.

Abstract

How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation? To gain insight into this question, we introduce the sequential group composition task. In this task, networks receive a sequence of elements from a finite group encoded in a real vector space and must predict their cumulative product. The task can be order-sensitive and requires a nonlinear architecture to be learned. Our analysis isolates the roles of the group structure, encoding statistics, and sequence length in shaping learning. We prove that two-layer networks learn this task one irreducible representation of the group at a time in an order determined by the Fourier statistics of the encoding. These networks can perfectly learn the task, but doing so requires a hidden width exponential in the sequence length $k$ . In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · 3D Shape Modeling and Analysis · Model Reduction and Neural Networks