More data speeds up training time in learning halfspaces over sparse   vectors

Amit Daniely; Nati Linial; Shai Shalev Shwartz

arXiv:1311.2271·cs.LG·November 12, 2013·37 cites

More data speeds up training time in learning halfspaces over sparse vectors

Amit Daniely, Nati Linial, Shai Shalev Shwartz

PDF

Open Access

TL;DR

This paper demonstrates that more data can reduce training time in learning halfspaces over sparse vectors, revealing a tradeoff between sample size and computational complexity under certain hardness assumptions.

Contribution

It introduces a novel methodology for establishing computational-statistical gaps and shows how additional data enables efficient learning of sparse halfspaces beyond traditional sample complexity limits.

Findings

01

More data speeds up learning of sparse halfspaces.

02

Computational-statistical gaps are established under hardness assumptions.

03

Efficient learning is possible with significantly more data than the sample complexity bound.

Abstract

The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a {\em computational} resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the first positive answer to this question for a {\em natural supervised learning problem} --- we consider agnostic PAC learning of halfspaces over $3$ -sparse vectors in ${- 1, 1, 0}^{n}$ . This class is inefficiently learnable using $O (n / ϵ^{2})$ examples. Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random $3CNF$ formulas is hard, it is impossible to efficiently learn this class…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Imbalanced Data Classification Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings