Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence
\.Ilker I\c{s}{\i}k, Ramazan Gokberk Cinbis, Ebru Aydin Gol

TL;DR
This paper introduces a novel token embedding method that enables language models to recognize interchangeable tokens and alpha-equivalence, improving generalization and reasoning in formal logic tasks.
Contribution
It formalizes the problem of token interchangeability, proposes alpha-covariance as a robustness metric, and introduces a dual-part embedding strategy to enhance model flexibility.
Findings
Improved generalization to unseen tokens in logic tasks
Enhanced recognition of alpha-equivalence
Favorable inductive bias for formal reasoning
Abstract
Language models lack the notion of interchangeable tokens: symbols that are semantically equivalent yet distinct, such as bound variables in formal logic. This limitation prevents generalization to larger vocabularies and hinders the model's ability to recognize alpha-equivalence, where renaming bound variables preserves meaning. We formalize this machine learning problem and introduce alpha-covariance, a metric for evaluating robustness to such transformations. To tackle this task, we propose a dual-part token embedding strategy: a shared component ensures semantic consistency, while a randomized component maintains token distinguishability. Compared to a baseline that relies on alpha-renaming for data augmentation, our approach demonstrates improved generalization to unseen tokens in linear temporal logic solving, propositional logic assignment prediction, and copying with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · semigroups and automata theory
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Multi-Head Attention · Adam · Dropout
