ArGoT: A Glossary of Terms extracted from the arXiv
Luis Berlioz (University of Pittsburgh)

TL;DR
ArGoT is a comprehensive dataset of mathematical terms from arXiv, organized into a dependency graph, enabling better understanding of mathematical concepts through embeddings and supporting formal verification efforts.
Contribution
The paper introduces ArGoT, a novel dataset of mathematical terms with dependency structures, linking natural language to formal theorem libraries.
Findings
Hyperbolic and standard embeddings reflect term relations
Dependency graph captures concept entailment
Dataset supports alignment with theorem prover libraries
Abstract
We introduce ArGoT, a data set of mathematical terms extracted from the articles hosted on the arXiv website. A term is any mathematical concept defined in an article. Using labels in the article's source code and examples from other popular math websites, we mine all the terms in the arXiv data and compile a comprehensive vocabulary of mathematical terms. Each term can be then organized in a dependency graph by using the term's definitions and the arXiv's metadata. Using both hyperbolic and standard word embeddings, we demonstrate how this structure is reflected in the text's vector representation and how they capture relations of entailment in mathematical concepts. This data set is part of an ongoing effort to align natural mathematical text with existing Interactive Theorem Prover Libraries (ITPs) of formally verified statements.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Computational Physics and Python Applications
