Self-Supervised Pretraining of Graph Neural Network for the Retrieval of Related Mathematical Expressions in Scientific Articles
Lukas Pfahler, Katharina Morik

TL;DR
This paper introduces a self-supervised graph neural network approach to embed mathematical expressions from scientific articles, enabling efficient retrieval across disciplines by capturing their semantic structure.
Contribution
It presents a novel unsupervised learning method using graph convolutional networks to embed mathematical expressions for improved retrieval in scientific literature.
Findings
Embedding models outperform keyword-based search
Large dataset of 29 million expressions used for training
Empirical evaluation shows improved retrieval accuracy
Abstract
Given the increase of publications, search for relevant papers becomes tedious. In particular, search across disciplines or schools of thinking is not supported. This is mainly due to the retrieval with keyword queries: technical terms differ in different sciences or at different times. Relevant articles might better be identified by their mathematical problem descriptions. Just looking at the equations in a paper already gives a hint to whether the paper is relevant. Hence, we propose a new approach for retrieval of mathematical expressions based on machine learning. We design an unsupervised representation learning task that combines embedding learning with self-supervised learning. Using graph convolutional neural networks we embed mathematical expression into low-dimensional vector spaces that allow efficient nearest neighbor queries. To train our models, we collect a huge dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mathematics, Computing, and Information Processing · Computational Physics and Python Applications
