DeepXiv-SDK: An Agentic Data Interface for Scientific Literature
Hongjin Qian, Ziyi Xia, Ze Liu, Jianlyu Chen, Kun Luo, Minghao Qin, Chaofan Li, Lei Xiong, Junwei Lan, Sen Wang, Zhengyang Liang, Yingxia Shao, Defu Lian, Zheng Liu

TL;DR
DeepXiv-SDK introduces a structured, multi-layered data interface for scientific literature, enhancing data accessibility, efficiency, and cost-effectiveness for AI agents working with unstructured research data.
Contribution
It presents a novel three-layer agentic data interface that transforms unstructured scientific data into structured formats and provides tools for efficient data retrieval and usage.
Findings
Supports the complete ArXiv corpus with daily updates
Provides RESTful APIs, Python SDK, and web demo for deep research workflows
Enhances data usability and reduces token consumption for AI agents
Abstract
LLM-agents are increasingly used to accelerate the progress of scientific research. Yet a persistent bottleneck is data access: agents not only lack readily available tools for retrieval, but also have to work with unstrcutured, human-centric data on the Internet, such as HTML web-pages and PDF files, leading to excessive token consumption, limit working efficiency, and brittle evidence look-up. This gap motivates the development of \textit{an agentic data interface}, which is designed to enable agents to access and utilize scientific literature in a more effective, efficient, and cost-aware manner. In this paper, we introduce DeepXiv-SDK, which offers a three-layer agentic data interface for scientific literature. 1) Data Layer, which transforms unstructured, human-centric data into normalized and structured representations in JSON format, improving data usability and enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Scientific Computing and Data Management · Topic Modeling
