Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network
Jiale Wang, Junhui Yu, Huanyong Liu, Chenanran Kong

TL;DR
This paper introduces a new large-scale dataset HDR-100M and a hierarchical network HDNet to improve recognition of complex, hierarchical mathematical formulas, significantly advancing the accuracy and robustness of mathematical expression recognition systems.
Contribution
The paper presents HDR, the first large-scale dataset for hierarchical formula recognition, and HDNet, a novel hierarchical network that enhances detail-focused parsing of complex formulas.
Findings
HDNet outperforms existing models on multiple datasets.
HDR-100M provides extensive training data for complex formula recognition.
Hierarchical and detail-focused approach improves parsing accuracy.
Abstract
Hierarchical and complex Mathematical Expression Recognition (MER) is challenging due to multiple possible interpretations of a formula, complicating both parsing and evaluation. In this paper, we introduce the Hierarchical Detail-Focused Recognition dataset (HDR), the first dataset specifically designed to address these issues. It consists of a large-scale training set, HDR-100M, offering an unprecedented scale and diversity with one hundred million training instances. And the test set, HDR-Test, includes multiple interpretations of complex hierarchical formulas for comprehensive model performance evaluation. Additionally, the parsing of complex formulas often suffers from errors in fine-grained details. To address this, we propose the Hierarchical Detail-Focused Recognition Network (HDNet), an innovative framework that incorporates a hierarchical sub-formula module, focusing on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Advanced Text Analysis Techniques
