Extracting Mathematical Concepts from Text

Jacob Collard; Valeria de Paiva; Brendan Fong; Eswaran; Subrahmanian

arXiv:2208.13830·cs.CL·August 31, 2022·1 cites

Extracting Mathematical Concepts from Text

Jacob Collard, Valeria de Paiva, Brendan Fong, Eswaran, Subrahmanian

PDF

Open Access

TL;DR

This paper compares four systems for extracting mathematical concepts from English texts in category theory to aid in building a mathematical knowledge graph, highlighting challenges and providing open corpora for research.

Contribution

It introduces a comparative analysis of term extraction methods in mathematical texts and releases two annotated corpora for research in category theory.

Findings

01

Different extractors show varying effectiveness on noisy domain texts.

02

Challenges in term extraction from mathematical language are identified.

03

Open corpora facilitate future research in mathematical NLP.

Abstract

We investigate different systems for extracting mathematical entities from English texts in the mathematical field of category theory as a first step for constructing a mathematical knowledge graph. We consider four different term extractors and compare their results. This small experiment showcases some of the issues with the construction and evaluation of terms extracted from noisy domain text. We also make available two open corpora in research mathematics, in particular in category theory: a small corpus of 755 abstracts from the journal TAC (3188 sentences), and a larger corpus from the nLab community wiki (15,000 sentences).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques · Semantic Web and Ontologies