TL;DR
ChatPD leverages large language models to automate extraction and organization of dataset information from academic papers, significantly improving efficiency and accuracy in building paper-dataset networks for research validation and discovery.
Contribution
The paper introduces ChatPD, a novel LLM-based system that automates dataset information extraction and entity resolution, outperforming existing platforms in accuracy and providing a continuous dataset discovery service.
Findings
Achieves about 90% precision and recall in entity resolution
Outperforms PapersWithCode in dataset usage extraction
Provides a continuous dataset discovery service
Abstract
Scientific research heavily depends on suitable datasets for method validation, but existing academic platforms with dataset management like PapersWithCode suffer from inefficiencies in their manual workflow. To overcome this bottleneck, we present a system, called ChatPD, that utilizes Large Language Models (LLMs) to automate dataset information extraction from academic papers and construct a structured paper-dataset network. Our system consists of three key modules: \textit{paper collection}, \textit{dataset information extraction}, and \textit{dataset entity resolution} to construct paper-dataset networks. Specifically, we propose a \textit{Graph Completion and Inference} strategy to map dataset descriptions to their corresponding entities. Through extensive experiments, we demonstrate that ChatPD not only outperforms the existing platform PapersWithCode in dataset usage extraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
