SSEmb: A Joint Structural and Semantic Embedding Framework for Mathematical Formula Retrieval
Ruyin Li, Xiaoyu Chen

TL;DR
SSEmb introduces a joint structural and semantic embedding framework for mathematical formula retrieval, combining graph contrastive learning and sentence embeddings to improve accuracy and outperform existing methods.
Contribution
The paper presents a novel embedding framework that captures both structural and semantic features of formulas, with a new graph augmentation method and fusion scheme, achieving state-of-the-art results.
Findings
Outperforms existing methods by over 5% on P'@10 and nDCG'@10.
Enhances performance of other methods when combined.
Achieves state-of-the-art results with Approach0.
Abstract
Formula retrieval is an important topic in Mathematical Information Retrieval. We propose SSEmb, a novel embedding framework capable of capturing both structural and semantic features of mathematical formulas. Structurally, we employ Graph Contrastive Learning to encode formulas represented as Operator Graphs. To enhance structural diversity while preserving mathematical validity of these formula graphs, we introduce a novel graph data augmentation approach through a substitution strategy. Semantically, we utilize Sentence-BERT to encode the surrounding text of formulas. Finally, for each query and its candidates, structural and semantic similarities are calculated separately and then fused through a weighted scheme. In the ARQMath-3 formula retrieval task, SSEmb outperforms existing embedding-based methods by over 5 percentage points on P'@10 and nDCG'@10. Furthermore, SSEmb enhances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
