Online Categorical Subspace Learning for Sketching Big Data with Misses
Yanning Shen, Morteza Mardani, Georgios B. Giannakis

TL;DR
This paper introduces a novel online subspace learning method for high-dimensional categorical data with missing entries, using probabilistic models and recursive algorithms to enable real-time sketching and analysis.
Contribution
It develops a rank-regularized maximum-likelihood estimator for categorical data, incorporating joint learning of quantization thresholds and subspace refinement in an online setting.
Findings
Converges asymptotically to stationary points for infinite data streams.
Achieves sublinear regret bounds for finite data streams.
Demonstrates effectiveness in real-world applications like movie recommendation and chess classification.
Abstract
With the scale of data growing every day, reducing the dimensionality (a.k.a. sketching) of high-dimensional data has emerged as a task of paramount importance. Relevant issues to address in this context include the sheer volume of data that may consist of categorical samples, the typically streaming format of acquisition, and the possibly missing entries. To cope with these challenges, the present paper develops a novel categorical subspace learning approach to unravel the latent structure for three prominent categorical (bilinear) models, namely, Probit, Tobit, and Logit. The deterministic Probit and Tobit models treat data as quantized values of an analog-valued process lying in a low-dimensional subspace, while the probabilistic Logit model relies on low dimensionality of the data log-likelihood ratios. Leveraging the low intrinsic dimensionality of the sought models, a rank…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
