Do NLP Models Know Numbers? Probing Numeracy in Embeddings

Eric Wallace; Yizhong Wang; Sujian Li; Sameer Singh; Matt Gardner

arXiv:1909.07940·cs.CL·September 19, 2019

Do NLP Models Know Numbers? Probing Numeracy in Embeddings

Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner

PDF

1 Repo

TL;DR

This paper investigates whether NLP models and embeddings inherently understand numbers, finding that many embeddings already encode a surprising degree of numeracy, especially in character-level models like ELMo.

Contribution

It demonstrates that standard embeddings contain inherent numeracy, and analyzes how models like BERT and GloVe encode numerical information.

Findings

01

GloVe and word2vec accurately encode magnitude up to 1,000

02

ELMo captures numeracy better than BERT

03

Pre-trained embeddings naturally encode some numerical reasoning abilities

Abstract

The ability to understand and work with numbers (numeracy) is critical for many complex reasoning tasks. Currently, most NLP models treat numbers in text in the same way as other tokens---they embed them as distributed vectors. Is this enough to capture numeracy? We begin by investigating the numerical reasoning capabilities of a state-of-the-art question answering model on the DROP dataset. We find this model excels on questions that require numerical reasoning, i.e., it already captures numeracy. To understand how this capability emerges, we probe token embedding methods (e.g., BERT, GloVe) on synthetic list maximum, number decoding, and addition tasks. A surprising degree of numeracy is naturally present in standard embeddings. For example, GloVe and word2vec accurately encode magnitude for numbers up to 1,000. Furthermore, character-level embeddings are even more precise---ELMo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andrewschreiber/hs-math-nlp
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Weight Decay · Residual Connection · Adam · Layer Normalization · Softmax · Attention Is All You Need · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention