Representing Numbers in NLP: a Survey and a Vision

Avijit Thawani; Jay Pujara; Pedro A. Szekely; Filip Ilievski

arXiv:2103.13136·cs.CL·March 25, 2021

Representing Numbers in NLP: a Survey and a Vision

Avijit Thawani, Jay Pujara, Pedro A. Szekely, Filip Ilievski

PDF

TL;DR

This paper surveys NLP approaches to number representation, categorizing tasks and methods, analyzing existing models, and proposing a comprehensive framework for holistic numeracy in NLP systems.

Contribution

It provides a detailed taxonomy of numeracy tasks, analyzes various representational choices, and outlines a vision for unified evaluation and design trade-offs in number representation in NLP.

Findings

01

Identified 7 subtasks of numeracy in NLP

02

Analyzed 18 number encoders and decoders

03

Proposed a unified framework for evaluation

Abstract

NLP systems rarely give special consideration to numbers found in text. This starkly contrasts with the consensus in neuroscience that, in the brain, numbers are represented differently from words. We arrange recent NLP work on numeracy into a comprehensive taxonomy of tasks and methods. We break down the subjective notion of numeracy into 7 subtasks, arranged along two dimensions: granularity (exact vs approximate) and units (abstract vs grounded). We analyze the myriad representational choices made by 18 previously published number encoders and decoders. We synthesize best practices for representing numbers in text and articulate a vision for holistic numeracy in NLP, comprised of design trade-offs and a unified evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.