Effects of context, complexity, and clustering on evaluation for math   formula retrieval

Behrooz Mansouri; Douglas W. Oard; Anurag Agarwal; and Richard Zanibbi

arXiv:2111.10504·cs.IR·November 23, 2021

Effects of context, complexity, and clustering on evaluation for math formula retrieval

Behrooz Mansouri, Douglas W. Oard, Anurag Agarwal, and Richard Zanibbi

PDF

Open Access

TL;DR

This paper investigates how context, complexity, and clustering influence the evaluation of mathematical formula retrieval systems, highlighting the importance of relevance definitions and formula clustering in system performance assessment.

Contribution

It provides a comparative analysis of six formula retrieval test collections, emphasizing the impact of relevance criteria, formula complexity, and clustering on evaluation outcomes.

Findings

01

Relevance definitions significantly affect system rankings.

02

Formula complexity influences retrieval performance.

03

Clustering formulas by Symbol Layout Trees impacts evaluation results.

Abstract

There are now several test collections for the formula retrieval task, in which a system's goal is to identify useful mathematical formulae to show in response to a query posed as a formula. These test collections differ in query format, query complexity, number of queries, content source, and relevance definition. Comparisons among six formula retrieval test collections illustrate that defining relevance based on query and/or document context can be consequential, that system results vary markedly with formula complexity, and that judging relevance after clustering formulas with identical symbol layouts (i.e., Symbol Layout Trees) can affect system preference ordering.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Advanced Database Systems and Queries · Algorithms and Data Compression