Random Sampling in an Age of Automation: Minimizing Expenditures through Balanced Collection and Annotation
Oscar Beijbom

TL;DR
This paper introduces a Hybrid-Offset sampling method that combines costly accurate annotations with inexpensive noisy ones to efficiently estimate population means, significantly reducing sampling costs in automated survey contexts.
Contribution
The paper proposes a novel Hybrid-Offset sampling design that optimally balances two types of annotators to minimize costs while maintaining accuracy.
Findings
Hybrid-Offset outperforms alternative designs in simulations.
Sampling costs reduced by 50% compared to conventional methods.
Provides necessary conditions and optimal sample sizes for the new design.
Abstract
Methods for automated collection and annotation are changing the cost-structures of sampling surveys for a wide range of applications. Digital samples in the form of images or audio recordings can be collected rapidly, and annotated by computer programs or crowd workers. We consider the problem of estimating a population mean under these new cost-structures, and propose a Hybrid-Offset sampling design. This design utilizes two annotators: a primary, which is accurate but costly (e.g. a human expert) and an auxiliary which is noisy but cheap (e.g. a computer program), in order to minimize total sampling expenditures. Our analysis gives necessary conditions for the Hybrid-Offset design and specifies optimal sample sizes for both annotators. Simulations on data from a coral reef survey program indicate that the Hybrid-Offset design outperforms several alternative sampling designs. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Data Stream Mining Techniques · Machine Learning and Algorithms
