appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit
Atsuki Yamaguchi, Terufumi Morishita

TL;DR
appjsonify is a flexible Python toolkit that converts academic paper PDFs into structured JSON format using layout analysis and customizable processing pipelines.
Contribution
It introduces a configurable, open-source PDF-to-JSON conversion toolkit specifically designed for academic papers, combining visual analysis and rule-based text processing.
Findings
Supports various paper formats with customizable pipelines
Open-source and easy to install via PyPI and GitHub
Utilizes visual layout analysis for accurate parsing
Abstract
We present appjsonify, a Python-based PDF-to-JSON conversion toolkit for academic papers. It parses a PDF file using several visual-based document layout analysis models and rule-based text processing approaches. appjsonify is a flexible tool that allows users to easily configure the processing pipeline to handle a specific format of a paper they wish to process. We are publicly releasing appjsonify as an easy-to-install toolkit available via PyPI and GitHub.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Data Visualization and Analytics · Scientific Computing and Data Management
