mod_oai: An Apache Module for Metadata Harvesting
Michael L. Nelson, Herbert Van de Sompel, Xiaoming Liu, Terry L., Harrison, Nathan McFarland

TL;DR
mod_oai is an Apache module that integrates OAI-PMH protocol support directly into the server, enabling efficient metadata harvesting for digital libraries and web content.
Contribution
It introduces a novel Apache module that embeds OAI-PMH harvesting capabilities, simplifying metadata exchange and improving efficiency over traditional separate or built-in implementations.
Findings
Demonstrated successful harvesting of departmental web content using mod_oai.
Compared web crawling and OAI-PMH harvesting techniques, showing advantages of integrated approach.
Validated the effectiveness of mod_oai in real-world web server environments.
Abstract
We describe mod_oai, an Apache 2.0 module that implements the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAIPMH is the de facto standard for metadata exchange in digital libraries and allows repositories to expose their contents in a structured, application-neutral format with semantics optimized for accurate incremental harvesting. Current implementations of OAI-PMH are either separate applications that access an existing repository, or are built-in to repository software packages. mod_oai is different in that it optimizes harvesting web content by building OAI-PMH capability into the Apache server. We discuss the implications of adding harvesting capability to an Apache server and describe our initial experimental results accessing a departmental web site using both web crawling and OAIPMH harvesting techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Advanced Database Systems and Queries · Data Quality and Management
