Prompt engineering for bibliographic web-scraping

Manuel Bl\'azquez-Ochando; Juan Jos\'e Prieto-Guti\'errez; Mar\'ia Antonia Ovalle-Perandones

arXiv:2603.19237·cs.DL·March 23, 2026·Scientometrics

Prompt engineering for bibliographic web-scraping

Manuel Bl\'azquez-Ochando, Juan Jos\'e Prieto-Guti\'errez, Mar\'ia Antonia Ovalle-Perandones

PDF

Open Access

TL;DR

This paper demonstrates how prompt engineering with ChatGPT-4o can efficiently generate fully functional web-scrapers for bibliographic catalogues, minimizing interaction and improving data extraction quality.

Contribution

It introduces a method to use prompt engineering with large language models to automatically develop web-scrapers for bibliographic data extraction.

Findings

01

Effective model for AI-assisted web-scraper development

02

Improved scraping quality through context-aware prompts

03

Minimal interaction needed for functional scraper generation

Abstract

Bibliographic catalogues store millions of data. The use of computer techniques such as web-scraping allows the extraction of data in an efficient and accurate manner. The recent emergence of ChatGPT is facilitating the development of suitable prompts that allow the configuration of scraping to identify and extract information from databases. The aim of this article is to define how to efficiently use prompts engineering to elaborate a suitable data entry model, able to generate in a single interaction with ChatGPT-4o, a fully functional web-scraper, programmed in PHP language, adapted to the case of bibliographic catalogues. As a demonstration example, the bibliographic catalogue of the National Library of Spain with a dataset of thousands of records is used. The findings present an effective model for developing web-scraping programs, assisted with AI and with the minimum possible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Web Data Mining and Analysis · Research Data Management Practices