Names Don't Matter: Symbol-Invariant Transformer for Open-Vocabulary Learning
\.Ilker I\c{s}{\i}k, Wenchao Li

TL;DR
This paper introduces a symbol-invariant Transformer mechanism that effectively handles interchangeable tokens, enabling better generalization to unseen symbols in open-vocabulary tasks through parallel embedding streams and structured attention.
Contribution
It presents a novel Transformer architecture that is provably invariant to token renaming, improving open-vocabulary learning and generalization to new symbols.
Findings
The method achieves theoretical invariance guarantees.
Substantial performance improvements on open-vocabulary tasks.
Effective generalization to unseen symbols.
Abstract
Current neural architectures lack a principled way to handle interchangeable tokens, i.e., symbols that are semantically equivalent yet distinguishable, such as bound variables. As a result, models trained on fixed vocabularies often struggle to generalize to unseen symbols, even when the underlying semantics remain unchanged. We propose a novel Transformer-based mechanism that is provably invariant to the renaming of interchangeable tokens. Our approach employs parallel embedding streams to isolate the contribution of each interchangeable token in the input, combined with an aggregated attention mechanism that enables structured information sharing across streams. Experimental results confirm the theoretical guarantees of our method and demonstrate substantial performance gains on open-vocabulary tasks that require generalization to novel symbols.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
