How to Leverage Digit Embeddings to Represent Numbers?

Jasivan Alex Sivakumar; Nafise Sadat Moosavi

arXiv:2407.00894·cs.CL·December 12, 2024

How to Leverage Digit Embeddings to Represent Numbers?

Jasivan Alex Sivakumar, Nafise Sadat Moosavi

PDF

Open Access 1 Repo

TL;DR

This paper investigates the use of mathematical priors to explicitly aggregate digit embeddings in transformer models, aiming to improve numerical reasoning by better representing numbers.

Contribution

It introduces a method to incorporate explicit digit aggregation using mathematical priors into transformer models, enhancing number representation in language models.

Findings

01

Explicit aggregation improves number understanding

02

Method is compatible with pretrained models

03

Approach is simple and publicly available

Abstract

Within numerical reasoning, understanding numbers themselves is still a challenge for existing language models. Simple generalisations, such as solving 100+200 instead of 1+2, can substantially affect model performance (Sivakumar and Moosavi, 2023). Among various techniques, character-level embeddings of numbers have emerged as a promising approach to improve number representation. However, this method has limitations as it leaves the task of aggregating digit representations to the model, which lacks direct supervision for this process. In this paper, we explore the use of mathematical priors to compute aggregated digit embeddings and explicitly incorporate these aggregates into transformer models. This can be achieved either by adding a special token to the input embeddings or by introducing an additional loss function to enhance correct predictions. We evaluate the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jasivan/Number-Embeddings
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Mathematics Education and Teaching Techniques · Statistics Education and Methodologies