Sparse One-Time Grab Sampling of Inliers

Maryam Jaberi; Marianna Pensky; Hassan Foroosh

arXiv:1901.02338·cs.LG·January 9, 2019

Sparse One-Time Grab Sampling of Inliers

Maryam Jaberi, Marianna Pensky, Hassan Foroosh

PDF

Open Access

TL;DR

This paper introduces a 'one-time-grab' sampling algorithm designed to efficiently select minimal samples from large datasets with multiple structures and outliers, ensuring coverage of all structures with high probability.

Contribution

It proposes a novel sampling method that minimizes the number of samples needed to capture all underlying structures in large, complex datasets, regardless of outliers.

Findings

01

Reduces sample size needed for structure detection

02

Guarantees coverage of all structures with high probability

03

Applicable as a front end to various clustering methods

Abstract

Estimating structures in "big data" and clustering them are among the most fundamental problems in computer vision, pattern recognition, data mining, and many other other research fields. Over the past few decades, many studies have been conducted focusing on different aspects of these problems. One of the main approaches that is explored in the literature to tackle the problems of size and dimensionality is sampling subsets of the data in order to estimate the characteristics of the whole population, e.g. estimating the underlying clusters or structures in the data. In this paper, we propose a `one-time-grab' sampling algorithm\cite{jaberi2015swift,jaberi2018sparse}. This method can be used as the front end to any supervised or unsupervised clustering method. Rather than focusing on the strategy of maximizing the probability of sampling inliers, our goal is to minimize the number of…

Figures8

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Geophysical Methods and Applications · Image and Object Detection Techniques

Full text

Formatting instructions for NIPS 2017

Maryam Jaberi, Marianna Pensky, Hassan Foroosh

Sparse One-Time Grab Sampling of Inliers

Maryam Jaberi, Marianna Pensky, Hassan Foroosh

Estimating structures in "big data" and clustering them are among the most fundamental problems in computer vision, pattern recognition, data mining, and many other other research fields. Over the past few decades, many studies have been conducted focusing on different aspects of these problems. One of the main approaches that is explored in the literature to tackle the problems of size and dimensionality is sampling subsets of the data in order to estimate the characteristics of the whole population, e.g. estimating the underlying clusters or structures in the data. In this paper, we propose a “one-time-grab” sampling algorithm[3, 5]. This method can be used as the front end to any supervised or unsupervised clustering method. Rather than focusing on the strategy of maximizing the probability of sampling inliers, our goal is to minimize the number of samples needed to instantiate all underlying model instances. More specifically, our goal is to answer the following question: “Given a very large population of points with $C$ embedded structures and gross outliers, what is the minimum number of points $r$ to be selected randomly in one grab in order to make sure with probability $P$ that at least $\varepsilon$ points are selected on each structure, where $\varepsilon$ is the number of degrees of freedom of each structure.” This problem can be modeled using hypergeometric pmf. In this paper, we study this model and show the accuracy of each of the method in choosing the sample size $r$ . The steps of the proposed method are summarized as follows:

(i) Estimate probability of selecting zero points in one structure. $P_{0}\leq\left(1-\frac{r}{C\theta}\right)^{\theta}\leq e^{-r/C}$

(ii) Estimate probability of selecting $\leq$ $\varepsilon$ points in one structure $\Delta\leq P_{0}\times\sum_{k=0}^{\varepsilon-1}{Cd\choose k}\left(\frac{\theta}{N-r-\theta+k}\right)^{k}$

(iii) Find the upper bounds for the tail probabilities of the multivariate hypergeometric distribution. $P(\cap_{i=1}^{C}(d_{i}\geq\varepsilon))\geq 1-C\Delta$

Using the non-decreasing property of the above equation, the sample size r can be computed using a binary search. Once sample size $r$ is estimated, a subset of points are sampled uniformly in a single one-time grab to instantiate and cluster the structures in the data. To verify this prediction, we investigated the accuracy of our approximation $r$ against theoretical values. We chose different population sizes with different embedded model instances. The result of theoretical and estimated values of $r$ are plotted against different desired probability values in the following figure 1a. These plots illustrate the average values of $r$ over $200$ independent trials for population sizes of $N=\{100,1000,10000\}$ . Figure 1b illustrates the sparsity of the proposed method comparing with state-of-the-art methods.[1]

As a generic unsupervised sparse sampling method, the proposed sampling method can be used in virtually any scenario where multiple structures need to be detected in a large population of points. Here, a population could be in a physical space (e.g. planar or 3D structures), or in some abstract feature space (e.g. the space of all fundamental matrices, all homographies in some configuration of scene/camera motion, or subspaces formed in some high dimensional spaces[2, 4]). Below, we give some examples.

Bibliography5

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Hoseinnezhad and A. Bab-Hadiashar. Multi-bernoulli sample consensus for simultaneous robust fitting of multiple structures in machine vision. Signal, Image and Video Processing , pages 1–10, 2014.
2[2] M. Jaberi. Sampling and subspace methods for learning sparse group structures in computer vision. 2018.
3[3] M. Jaberi, M. Pensky, and H. Foroosh. Swift: Sparse withdrawal of inliers in a first trial. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 4849–4857, 2015.
4[4] M. Jaberi, M. Pensky, and H. Foroosh. Probabilistic sparse subspace clustering using delayed association. In 2018 24th International Conference on Pattern Recognition (ICPR) , pages 2087–2092. IEEE, 2018.
5[5] M. Jaberi, M. Pensky, and H. Foroosh. Sparse one-grab sampling with probabilistic guarantees. IEEE transactions on pattern analysis and machine intelligence , 2018.