Disambiguating Symbolic Expressions in Informal Documents

Dennis M\"uller; Cezary Kaliszyk

arXiv:2101.11716·cs.LG·January 29, 2021

Disambiguating Symbolic Expressions in Informal Documents

Dennis M\"uller, Cezary Kaliszyk

PDF

Open Access 1 Video

TL;DR

This paper introduces a new task of disambiguating symbolic expressions in informal LaTeX documents, presenting a dataset and a transformer-based approach that shows promising results despite limited data.

Contribution

The paper formulates the disambiguation of symbolic expressions as a neural translation task and provides a novel dataset along with a transformer-based methodology for this challenge.

Findings

01

Baseline models failed to produce valid LaTeX syntax.

02

Pre-trained transformer models improved disambiguation performance.

03

Evaluation techniques considering syntax and semantics enhanced model assessment.

Abstract

We propose the task of disambiguating symbolic expressions in informal STEM documents in the form of LaTeX files - that is, determining their precise semantics and abstract syntax tree - as a neural machine translation task. We discuss the distinct challenges involved and present a dataset with roughly 33,000 entries. We evaluated several baseline models on this dataset, which failed to yield even syntactically valid LaTeX before overfitting. Consequently, we describe a methodology using a transformer language model pre-trained on sources obtained from arxiv.org, which yields promising results despite the small size of the dataset. We evaluate our model using a plurality of dedicated techniques, taking the syntax and semantics of symbolic expressions into account.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Disambiguating Symbolic Expressions in Informal Documents· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Mathematics, Computing, and Information Processing