Matrix factorization and prediction for high dimensional co-occurrence count data via shared parameter alternating zero inflated Gamma model
Taejoon Kim, Haiyan Wang

TL;DR
This paper introduces a novel shared parameter alternating zero-inflated Gamma model for high-dimensional sparse co-occurrence count data, enabling effective prediction of item or user relevance by modeling counts with zero-inflation and Gamma distributions.
Contribution
It develops the SA-ZIG model with two link functions, proposes parameter updating schemes, and provides convergence analysis, advancing methods for high-dimensional sparse count data analysis.
Findings
SA-ZIG with Fisher scoring and learning rate adjustment performs well in simulations.
The model effectively captures zero-inflation and skewness in co-occurrence data.
Numerical studies validate the proposed estimation algorithms.
Abstract
High-dimensional sparse matrix data frequently arise in various applications. A notable example is the weighted word-word co-occurrence count data, which summarizes the weighted frequency of word pairs appearing within the same context window. This type of data typically contains highly skewed non-negative values with an abundance of zeros. Another example is the co-occurrence of item-item or user-item pairs in e-commerce, which also generates high-dimensional data. The objective is to utilize this data to predict the relevance between items or users. In this paper, we assume that items or users can be represented by unknown dense vectors. The model treats the co-occurrence counts as arising from zero-inflated Gamma random variables and employs cosine similarity between the unknown vectors to summarize item-item relevance. The unknown values are estimated using the shared parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
