How to Leverage Digit Embeddings to Represent Numbers?
Jasivan Alex Sivakumar, Nafise Sadat Moosavi

TL;DR
This paper investigates the use of mathematical priors to explicitly aggregate digit embeddings in transformer models, aiming to improve numerical reasoning by better representing numbers.
Contribution
It introduces a method to incorporate explicit digit aggregation using mathematical priors into transformer models, enhancing number representation in language models.
Findings
Explicit aggregation improves number understanding
Method is compatible with pretrained models
Approach is simple and publicly available
Abstract
Within numerical reasoning, understanding numbers themselves is still a challenge for existing language models. Simple generalisations, such as solving 100+200 instead of 1+2, can substantially affect model performance (Sivakumar and Moosavi, 2023). Among various techniques, character-level embeddings of numbers have emerged as a promising approach to improve number representation. However, this method has limitations as it leaves the task of aggregating digit representations to the model, which lacks direct supervision for this process. In this paper, we explore the use of mathematical priors to compute aggregated digit embeddings and explicitly incorporate these aggregates into transformer models. This can be achieved either by adding a special token to the input embeddings or by introducing an additional loss function to enhance correct predictions. We evaluate the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Mathematics Education and Teaching Techniques · Statistics Education and Methodologies
