Evaluating the SiteStory Transactional Web Archive With the ApacheBench Tool
Justin F. Brunelle, Michael L. Nelson

TL;DR
This paper evaluates the performance of SiteStory, a transactional web archiving system, using ApacheBench, demonstrating its minimal impact on server performance and its suitability for high-fidelity web archiving.
Contribution
It introduces a performance evaluation of SiteStory, an open-source transactional web archive, showing its feasibility for production environments with minimal performance degradation.
Findings
SiteStory has negligible impact on server response times.
Performance slowdown is minimal under load conditions.
SiteStory is suitable for high-fidelity web archiving.
Abstract
Conventional Web archives are created by periodically crawling a web site and archiving the responses from the Web server. Although easy to implement and common deployed, this form of archiving typically misses updates and may not be suitable for all preservation scenarios, for example a site that is required (perhaps for records compliance) to keep a copy of all pages it has served. In contrast, transactional archives work in conjunction with a Web server to record all pages that have been served. Los Alamos National Laboratory has developed SiteSory, an open-source transactional archive written in Java solution that runs on Apache Web servers, provides a Memento compatible access interface, and WARC file export features. We used the ApacheBench utility on a pre-release version of to measure response time and content delivery time in different environments and on different machines.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Advanced Data Storage Technologies · Advanced Database Systems and Queries
