A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram
Ming-Liang Zhang, Fei Yin, Cheng-Lin Liu

TL;DR
This paper introduces PGPSNet, a neural solver that converts diagrams into textual clauses for improved multi-modal geometric problem solving, supported by a new large-scale dataset PGPS9K.
Contribution
The paper presents a novel multi-modal neural solver with diagram-to-text conversion and a new annotated dataset for geometry problem solving.
Findings
PGPSNet outperforms existing neural solvers on PGPS9K and Geometry3K datasets.
Conversion of diagrams into textual clauses enhances geometric reasoning.
Structural and semantic pre-training improve model understanding and accuracy.
Abstract
Geometry problem solving (GPS) is a high-level mathematical reasoning requiring the capacities of multi-modal fusion and geometric knowledge application. Recently, neural solvers have shown great potential in GPS but still be short in diagram presentation and modal fusion. In this work, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neural solver called PGPSNet to fuse multi-modal information efficiently. Combining structural and semantic pre-training, data augmentation and self-limited decoding, PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation, and therefore promotes geometric understanding and reasoning. In addition, to facilitate the research of GPS, we build a new large-scale and fine-annotated GPS dataset named PGPS9K, labeled with both fine-grained diagram annotation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Constraint Satisfaction and Optimization
