BERT is not The Count: Learning to Match Mathematical Statements with   Proofs

Weixian Waylon Li; Yftah Ziser; Maximin Coavoux; Shay B. Cohen

arXiv:2302.09350·cs.CL·February 21, 2023

BERT is not The Count: Learning to Match Mathematical Statements with Proofs

Weixian Waylon Li, Yftah Ziser, Maximin Coavoux, Shay B. Cohen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new dataset and methods for matching mathematical statements with their proofs, revealing that pre-trained language models perform well but rely on shallow symbolic cues.

Contribution

The paper presents the MATcH dataset, a bilinear similarity model, and analysis of language models' capabilities in mathematical statement-proof matching.

Findings

01

Mean reciprocal rank of 73.7 achieved by models

02

Models rely on shallow symbolic analysis

03

Proposed methods improve matching accuracy

Abstract

We introduce a task consisting in matching a proof to a given mathematical statement. The task fits well within current research on Mathematical Information Retrieval and, more generally, mathematical article analysis (Mathematical Sciences, 2014). We present a dataset for the task (the MATcH dataset) consisting of over 180k statement-proof pairs extracted from modern mathematical research articles. We find this dataset highly representative of our task, as it consists of relatively new findings useful to mathematicians. We propose a bilinear similarity model and two decoding methods to match statements to proofs effectively. While the first decoding method matches a proof to a statement without being aware of other statements or proofs, the second method treats the task as a global matching problem. Through a symbol replacement procedure, we analyze the "insights" that pre-trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

waylonli/match
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Topic Modeling

MethodsAttentive Walk-Aggregating Graph Neural Network