Robots Still Outnumber Humans in Web Archives, But Less Than Before
Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson,, Michele C. Weigle

TL;DR
This study compares robot and human access patterns in web archives from 2012 and 2019, revealing a decrease in robot dominance over time and evolving browsing behaviors.
Contribution
It provides a comparative analysis of user access patterns and robot detection in two major web archives across two different years, highlighting changes in robot activity and browsing behaviors.
Findings
Robots decreased in number from 2012 to 2019 in IA.
Robots dominate request volume in Arquivo.pt 2019.
Access patterns evolved from limited to diverse over time.
Abstract
To identify robots and humans and analyze their respective access patterns, we used the Internet Archive's (IA) Wayback Machine access logs from 2012 and 2019, as well as Arquivo.pt's (Portuguese Web Archive) access logs from 2019. We identified user sessions in the access logs and classified those sessions as human or robot based on their browsing behavior. To better understand how users navigate through the web archives, we evaluated these sessions to discover user access patterns. Based on the two archives and between the two years of IA access logs (2012 vs. 2019), we present a comparison of detected robots vs. humans and their user access patterns and temporal preferences. The total number of robots detected in IA 2012 is greater than in IA 2019 (21% more in requests and 18% more in sessions). Robots account for 98% of requests (97% of sessions) in Arquivo.pt (2019). We found that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis
