When Symbol Names Should Not Matter: A Logistic Theory of Fresh-Symbol Classification

Wenjie Guan; Jelena Bradic

arXiv:2605.07120·cs.LG·May 11, 2026

When Symbol Names Should Not Matter: A Logistic Theory of Fresh-Symbol Classification

Wenjie Guan, Jelena Bradic

PDF

TL;DR

This paper develops a theoretical framework analyzing how transformers classify abstract symbols invariant to renaming, focusing on the effects of token overlaps and collision geometry on generalization.

Contribution

It introduces a collision graph-based analysis of logistic classification in transformers, extending template-based reasoning to new settings and providing margin guarantees.

Findings

01

Vocabulary size influences collision rates and classification margins.

02

Collision geometry determines the preservation of classification margins.

03

Regularization and sample size impact fresh-symbol generalization.

Abstract

Template tasks have emerged as a clean testbed for asking whether transformers reason with abstract symbols rather than concrete token names. We study the fixed-label classification version of this problem, where train and test examples share latent templates but may use disjoint vocabularies. Unlike next-token prediction, the model need not emit unseen symbols; it must learn a decision rule invariant to symbol renaming. We analyze regularized kernel logistic classification in the transformer-kernel regime. Our main result decomposes the learned predictor into an ideal template-level classifier and a finite-sample perturbation caused by accidental token overlaps in the training data. We encode these overlaps by a colored collision graph and prove high-probability margin-transfer guarantees for fresh-symbol classification. This perspective extends template-based analyses to logistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.