Online Categorical Subspace Learning for Sketching Big Data with Misses

Yanning Shen; Morteza Mardani; Georgios B. Giannakis

arXiv:1609.08235·stat.ML·August 2, 2017·IEEE Trans. Signal Process.

Online Categorical Subspace Learning for Sketching Big Data with Misses

Yanning Shen, Morteza Mardani, Georgios B. Giannakis

PDF

TL;DR

This paper introduces a novel online subspace learning method for high-dimensional categorical data with missing entries, using probabilistic models and recursive algorithms to enable real-time sketching and analysis.

Contribution

It develops a rank-regularized maximum-likelihood estimator for categorical data, incorporating joint learning of quantization thresholds and subspace refinement in an online setting.

Findings

01

Converges asymptotically to stationary points for infinite data streams.

02

Achieves sublinear regret bounds for finite data streams.

03

Demonstrates effectiveness in real-world applications like movie recommendation and chess classification.

Abstract

With the scale of data growing every day, reducing the dimensionality (a.k.a. sketching) of high-dimensional data has emerged as a task of paramount importance. Relevant issues to address in this context include the sheer volume of data that may consist of categorical samples, the typically streaming format of acquisition, and the possibly missing entries. To cope with these challenges, the present paper develops a novel categorical subspace learning approach to unravel the latent structure for three prominent categorical (bilinear) models, namely, Probit, Tobit, and Logit. The deterministic Probit and Tobit models treat data as quantized values of an analog-valued process lying in a low-dimensional subspace, while the probabilistic Logit model relies on low dimensionality of the data log-likelihood ratios. Leveraging the low intrinsic dimensionality of the sought models, a rank…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.