Parmesan: mathematical concept extraction for education
Jacob Collard, Valeria de Paiva, Eswaran Subrahmanian

TL;DR
Parmesan is a prototype system designed to extract and understand mathematical concepts in context, specifically in category theory, to support multidisciplinary research and improve natural language processing in mathematics.
Contribution
The paper introduces Parmesan, a novel system for extracting and linking mathematical concepts in context, tailored for the domain of category theory, with new annotated corpora and hybrid NLP techniques.
Findings
Existing NLP techniques are insufficient for category theory.
Hybrid methods improve concept extraction accuracy.
Annotated corpora support future research and system development.
Abstract
Mathematics is a highly specialized domain with its own unique set of challenges that has seen limited study in natural language processing. However, mathematics is used in a wide variety of fields and multidisciplinary research in many different domains often relies on an understanding of mathematical concepts. To aid researchers coming from other fields, we develop a prototype system for searching for and defining mathematical concepts in context, focusing on the field of category theory. This system, Parmesan, depends on natural language processing components including concept extraction, relation extraction, definition extraction, and entity linking. In developing this system, we show that existing techniques cannot be applied directly to the category theory domain, and suggest hybrid techniques that do perform well, though we expect the system to evolve over time. We also provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
