PGDP5K: A Diagram Parsing Dataset for Plane Geometry Problems
Yihan Hao (1, 2), Mingliang Zhang (2, 3), Fei Yin (2, 3) and, Linlin Huang (1) ((1) Beijing Jiaotong University, (2) Institute of, Automation of Chinese Academy of Science, (3) University of Chinese Academy, of Sciences)

TL;DR
The paper introduces PGDP5K, a large-scale, finely annotated diagram dataset for plane geometry problems, aiming to advance diagram parsing and geometric reasoning research.
Contribution
It provides a new extensive dataset with detailed primitive-level annotations and a novel annotation method for plane geometry diagram parsing.
Findings
State-of-the-art methods achieve only 66.07% F1 on PGDP5K.
PGDP5K presents a challenging benchmark for future research.
The dataset enables automatic generation of geometric propositions.
Abstract
Diagram parsing is an important foundation for geometry problem solving, attracting increasing attention in the field of intelligent education and document image understanding. Due to the complex layout and between-primitive relationship, plane geometry diagram parsing (PGDP) is still a challenging task deserving further research and exploration. An appropriate dataset is critical for the research of PGDP. Although some datasets with rough annotations have been proposed to solve geometric problems, they are either small in scale or not publicly available. The rough annotations also make them not very useful. Thus, we propose a new large-scale geometry diagram dataset named PGDP5K and a novel annotation method. Our dataset consists of 5000 diagram samples composed of 16 shapes, covering 5 positional relations, 22 symbol types and 6 text types. Different from previous datasets, our PGDP5K…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Mathematics, Computing, and Information Processing · Constraint Satisfaction and Optimization
