Estimating Numbers without Regression

Avijit Thawani; Jay Pujara; Ashwin Kalyan

arXiv:2310.06204·cs.CL·October 11, 2023·1 cites

Estimating Numbers without Regression

Avijit Thawani, Jay Pujara, Ashwin Kalyan

PDF

Open Access

TL;DR

This paper shows that simple tokenization schemes can significantly improve a language model's ability to estimate numbers, outperforming complex architectural modifications in certain tasks.

Contribution

It demonstrates that changing the vocabulary and tokenization approach is more effective and simpler than architectural changes for number estimation in language models.

Findings

01

Tokenization-based methods perform on par with architectural changes.

02

Vocabulary modifications improve number estimation accuracy.

03

Simple tokenization schemes are effective for numerical fact estimation.

Abstract

Despite recent successes in language models, their ability to represent numbers is insufficient. Humans conceptualize numbers based on their magnitudes, effectively projecting them on a number line; whereas subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks. To alleviate this shortcoming, alternative approaches have been proposed that modify numbers at various stages of the language modeling pipeline. These methods change either the (1) notation in which numbers are written (\eg scientific vs decimal), the (2) vocabulary used to represent numbers or the entire (3) architecture of the underlying language model, to directly regress to a desired number. Previous work suggests that architectural change helps achieve state-of-the-art on number estimation but we find an insightful ablation: changing the model's vocabulary instead (\eg…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques