ChatPD: An LLM-driven Paper-Dataset Networking System

Anjie Xu; Ruiqing Ding; Leye Wang

arXiv:2505.22349·cs.DB·May 29, 2025

ChatPD: An LLM-driven Paper-Dataset Networking System

Anjie Xu, Ruiqing Ding, Leye Wang

PDF

1 Repo

TL;DR

ChatPD leverages large language models to automate extraction and organization of dataset information from academic papers, significantly improving efficiency and accuracy in building paper-dataset networks for research validation and discovery.

Contribution

The paper introduces ChatPD, a novel LLM-based system that automates dataset information extraction and entity resolution, outperforming existing platforms in accuracy and providing a continuous dataset discovery service.

Findings

01

Achieves about 90% precision and recall in entity resolution

02

Outperforms PapersWithCode in dataset usage extraction

03

Provides a continuous dataset discovery service

Abstract

Scientific research heavily depends on suitable datasets for method validation, but existing academic platforms with dataset management like PapersWithCode suffer from inefficiencies in their manual workflow. To overcome this bottleneck, we present a system, called ChatPD, that utilizes Large Language Models (LLMs) to automate dataset information extraction from academic papers and construct a structured paper-dataset network. Our system consists of three key modules: \textit{paper collection}, \textit{dataset information extraction}, and \textit{dataset entity resolution} to construct paper-dataset networks. Specifically, we propose a \textit{Graph Completion and Inference} strategy to map dataset descriptions to their corresponding entities. Through extensive experiments, we demonstrate that ChatPD not only outperforms the existing platform PapersWithCode in dataset usage extraction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chatPD-web/ChatPD
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodstravel james