Extracting Mathematical Concepts from Text
Jacob Collard, Valeria de Paiva, Brendan Fong, Eswaran, Subrahmanian

TL;DR
This paper compares four systems for extracting mathematical concepts from English texts in category theory to aid in building a mathematical knowledge graph, highlighting challenges and providing open corpora for research.
Contribution
It introduces a comparative analysis of term extraction methods in mathematical texts and releases two annotated corpora for research in category theory.
Findings
Different extractors show varying effectiveness on noisy domain texts.
Challenges in term extraction from mathematical language are identified.
Open corpora facilitate future research in mathematical NLP.
Abstract
We investigate different systems for extracting mathematical entities from English texts in the mathematical field of category theory as a first step for constructing a mathematical knowledge graph. We consider four different term extractors and compare their results. This small experiment showcases some of the issues with the construction and evaluation of terms extracted from noisy domain text. We also make available two open corpora in research mathematics, in particular in category theory: a small corpus of 755 abstracts from the journal TAC (3188 sentences), and a larger corpus from the nLab community wiki (15,000 sentences).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques · Semantic Web and Ontologies
