Hashed Binary Search Sampling for Convolutional Network Training with Large Overhead Image Patches
Dalton Lunga, Lexie Yang, and Budhendra Bhaduri

TL;DR
This paper introduces a hashing-based sampling framework that efficiently selects diverse, high-variance image patches from large overhead imagery datasets to improve convolutional network training and generalization.
Contribution
It proposes a novel binary search tree sampling combined with kernel-based hashing to reduce redundancy and noise in training data for large-scale overhead imagery analysis.
Findings
Reduces redundant sampling in large datasets
Accelerates training by focusing on high-variance patches
Improves model generalization over wide geographical scenes
Abstract
Very large overhead imagery associated with ground truth maps has the potential to generate billions of training image patches for machine learning algorithms. However, random sampling selection criteria often leads to redundant and noisy-image patches for model training. With minimal research efforts behind this challenge, the current status spells missed opportunities to develop supervised learning algorithms that generalize over wide geographical scenes. In addition, much of the computational cycles for large scale machine learning are poorly spent crunching through noisy and redundant image patches. We demonstrate a potential framework to address these challenges specifically, while evaluating a human settlement detection task. A novel binary search tree sampling scheme is fused with a kernel based hashing procedure that maps image patches into hash-buckets using binary codes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
