Mathematical Entities: Corpora and Benchmarks

Jacob Collard; Valeria de Paiva; Eswaran Subrahmanian

arXiv:2406.11577·cs.CL·June 18, 2024

Mathematical Entities: Corpora and Benchmarks

Jacob Collard, Valeria de Paiva, Eswaran Subrahmanian

PDF

Open Access 2 Repos

TL;DR

This paper introduces annotated corpora and benchmarks for mathematical language, evaluating NLP models' ability to process mathematical texts and highlighting the need for specialized adaptations.

Contribution

It provides large, annotated mathematical corpora, benchmarks for NLP tasks in mathematics, and a learning assistant, addressing the scarcity of resources in this domain.

Findings

01

Terminology extraction in mathematics is challenging.

02

Standard NLP models struggle with mathematical definitions.

03

Additional domain-specific adaptation is required for effective NLP in mathematics.

Abstract

Mathematics is a highly specialized domain with its own unique set of challenges. Despite this, there has been relatively little research on natural language processing for mathematical texts, and there are few mathematical language resources aimed at NLP. In this paper, we aim to provide annotated corpora that can be used to study the language of mathematics in different contexts, ranging from fundamental concepts found in textbooks to advanced research mathematics. We preprocess the corpora with a neural parsing model and some manual intervention to provide part-of-speech tags, lemmas, and dependency trees. In total, we provide 182397 sentences across three corpora. We then aim to test and evaluate several noteworthy natural language processing models using these corpora, to show how well they can adapt to the domain of mathematics and provide useful tools for exploring mathematical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing

MethodsSparse Evolutionary Training