Preliminary Exploration of Formula Embedding for Mathematical Information Retrieval: can mathematical formulae be embedded like a natural language?
Liangcai Gao, Zhuoren Jiang, Yue Yin, Ke Yuan, Zuoyu Yan, Zhi Tang

TL;DR
This paper investigates whether mathematical formulae can be embedded like natural language using neural representation techniques, aiming to improve mathematical information retrieval.
Contribution
It introduces a novel 'symbol2vec' method for formula symbol embedding and a 'formula2vec' approach for MIR, exploring their potential in mathematical language processing.
Findings
Preliminary results show promising potential for formula embedding in MIR.
Analysis highlights key differences between natural and mathematical language.
Initial experiments indicate effectiveness of proposed embedding methods.
Abstract
While neural network approaches are achieving breakthrough performance in the natural language related fields, there have been few similar attempts at mathematical language related tasks. In this study, we explore the potential of applying neural representation techniques to Mathematical Information Retrieval (MIR) tasks. In more detail, we first briefly analyze the characteristic differences between natural language and mathematical language. Then we design a "symbol2vec" method to learn the vector representations of formula symbols (numbers, variables, operators, functions, etc.) Finally, we propose a "formula2vec" based MIR approach and evaluate its performance. Preliminary experiment results show that there is a promising potential for applying formula embedding models to mathematical language representation and MIR tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Natural Language Processing Techniques · Topic Modeling
