Mining Mathematical Documents for Question Answering via Unsupervised Formula Labeling
Philipp Scharpf, Moritz Schubotz, Bela Gipp

TL;DR
This paper introduces unsupervised formula labeling and mathematical entity linking techniques to improve semantic formula search and question answering on mathematical documents, evaluated across multiple data sources and compared with commercial systems.
Contribution
It presents novel data mining methods for unsupervised formula labeling and entity linking, enabling semantic search in mathematical documents and question answering.
Findings
Effective formula retrieval across arXiv, Wikipedia, and Wikidata
Comparable performance to Wolfram Alpha and Google in specific queries
Open source system available for community use
Abstract
The increasing number of questions on Question Answering (QA) platforms like Math Stack Exchange (MSE) signifies a growing information need to answer math-related questions. However, there is currently very little research on approaches for an open data QA system that retrieves mathematical formulae using their concept names or querying formula identifier relationships from knowledge graphs. In this paper, we aim to bridge the gap by presenting data mining methods and benchmark results to employ Mathematical Entity Linking (MathEL) and Unsupervised Formula Labeling (UFL) for semantic formula search and mathematical question answering (MathQA) on the arXiv preprint repository, Wikipedia, and Wikidata, which is part of the Wikimedia ecosystem of free knowledge. Based on different types of information needs, we evaluate our system in 15 information need modes, assessing over 7,000 query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Topic Modeling · Advanced Text Analysis Techniques
