DDCL-INCRT: A Self-Organising Transformer with Hierarchical Prototype Structure (Theoretical Foundations)
Giansalvo Cirrincione

TL;DR
This paper introduces DDCL-INCRT, a self-organising transformer architecture that automatically determines its structure during training through hierarchical prototypes and incremental head addition, supported by theoretical guarantees.
Contribution
It presents a novel self-organising transformer framework combining prototype-based learning and incremental head addition with formal theoretical analysis.
Findings
Prototypes automatically spread apart during training without explicit regularisation.
Heads are added only when existing heads' information is insufficient, leading to adaptive depth.
The resulting hierarchical structure is proven to be minimal, unique, and sufficient for the task.
Abstract
Modern neural networks of the transformer family require the practitioner to decide, before training begins, how many attention heads to use, how deep the network should be, and how wide each component should be. These decisions are made without knowledge of the task, producing architectures that are systematically larger than necessary: empirical studies find that a substantial fraction of heads and layers can be removed after training without performance loss. This paper introduces DDCL-INCRT, an architecture that determines its own structure during training. Two complementary ideas are combined. The first, DDCL (Deep Dual Competitive Learning), replaces the feedforward block with a dictionary of learned prototype vectors representing the most informative directions in the data. The prototypes spread apart automatically, driven by the training objective, without explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
