Learning to Crawl
Utkarsh Upadhyay, Robert Busa-Fekete, Wojciech Kotlowski, David Pal,, Balazs Szorenyi

TL;DR
This paper addresses web crawling optimization under unknown change rates by developing an online estimation method and an explore-and-commit algorithm that achieves near-optimal performance with limited observations.
Contribution
It introduces an online estimation approach for change rates with partial observability and demonstrates an explore-and-commit algorithm with sublinear regret for web crawling.
Findings
Proposed a practical estimator for change rates based on partial observations.
Achieved an $ ilde{O}( oot{T})$ regret bound with the explore-and-commit algorithm.
Simulation results show near-optimal performance across various parameters.
Abstract
Web crawling is the problem of keeping a cache of webpages fresh, i.e., having the most recent copy available when a page is requested. This problem is usually coupled with the natural restriction that the bandwidth available to the web crawler is limited. The corresponding optimization problem was solved optimally by Azar et al. [2018] under the assumption that, for each webpage, both the elapsed time between two changes and the elapsed time between two requests follow a Poisson distribution with known parameters. In this paper, we study the same control problem but under the assumption that the change rates are unknown a priori, and thus we need to estimate them in an online fashion using only partial observations (i.e., single-bit signals indicating whether the page has changed since the last refresh). As a point of departure, we characterise the conditions under which one can solve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Advanced Bandit Algorithms Research · Age of Information Optimization
