TL;DR
This paper introduces a novel cumulative shrinkage prior for models with unknown dimensions, such as factor analysis, improving dimension recovery and model inference through theoretical and practical advantages.
Contribution
It proposes the cumulative shrinkage process, a new increasing shrinkage prior that enhances dimension inference in over-complete models like factor analysis.
Findings
Improved ability to recover the true model dimension.
Demonstrated advantages over existing methods in simulations.
Effective in real personality traits data analysis.
Abstract
There is a wide variety of models in which the dimension of the parameter space is unknown. For example, in factor analysis the number of latent factors is typically not known and has to be inferred from the observed data. Although classical shrinkage priors are useful in these contexts, increasing shrinkage priors can provide a more effective option, which progressively penalizes expansions with growing complexity. In this article we propose a novel increasing shrinkage prior, named the cumulative shrinkage process, for the parameters controlling the dimension in over-complete formulations. Our construction has broad applicability, simple interpretation, and is based on a sequence of spike and slab distributions which assign increasing mass to the spike as model complexity grows. Using factor analysis as an illustrative example, we show that this formulation has theoretical and…
| method | mse | averaged ess | runtime (s) | ||||
|---|---|---|---|---|---|---|---|
| median | iqr | median | iqr | median | median | ||
| (20,5) | cusp | 0.75 | 0.29 | 5.00 | 0.00 | 655.04 | 310.76 |
| mgp | 0.75 | 0.32 | 19.69 | 0.21 | 547.23 | 616.61 | |
| (50,10) | cusp | 2.25 | 0.33 | 10.00 | 0.00 | 273.55 | 716.23 |
| mgp | 2.26 | 0.28 | 28.64 | 1.94 | 251.35 | 1845.88 | |
| (100,15) | cusp | 3.76 | 0.40 | 15.00 | 0.00 | 175.26 | 2284.87 |
| mgp | 3.97 | 0.45 | 34.38 | 2.92 | 116.10 | 5002.33 | |
| mse | averaged ess | runtime (s) | |||||
|---|---|---|---|---|---|---|---|
| median | iqr | median | iqr | median | median | ||
| (20,5) | (2.5,2,2,0.05) | 0.74 | 0.32 | 5.00 | 0.00 | 626.22 | 317.31 |
| (10,2,2,0.05) | 0.74 | 0.33 | 5.00 | 0.00 | 636.61 | 314.82 | |
| (5,2,1,0.05) | 0.72 | 0.34 | 5.00 | 0.00 | 607.61 | 322.68 | |
| (5,1,2,0.05) | 0.79 | 0.30 | 5.00 | 0.00 | 602.28 | 309.39 | |
| (5,2,2,0.025) | 0.78 | 0.31 | 5.00 | 0.00 | 655.80 | 313.21 | |
| (5,2,2,0.1) | 0.74 | 0.30 | 5.00 | 0.04 | 604.88 | 315.51 | |
| (50,10) | (2.5,2,2,0.05) | 2.25 | 0.40 | 10.00 | 0.00 | 280.39 | 719.11 |
| (10,2,2,0.05) | 2.20 | 0.36 | 10.00 | 0.00 | 277.89 | 748.75 | |
| (5,2,1,0.05) | 2.16 | 0.42 | 10.00 | 0.00 | 266.82 | 722.67 | |
| (5,1,2,0.05) | 2.35 | 0.40 | 10.00 | 0.00 | 272.47 | 689.70 | |
| (5,2,2,0.025) | 2.22 | 0.35 | 10.00 | 0.00 | 280.60 | 717.19 | |
| (5,2,2,0.1) | 2.22 | 0.41 | 10.00 | 0.00 | 273.39 | 698.96 | |
| (100,15) | (2.5,2,2,0.05) | 3.68 | 0.47 | 15.00 | 0.00 | 176.31 | 2247.44 |
| (10,2,2,0.05) | 3.74 | 0.40 | 15.00 | 0.00 | 172.02 | 2205.78 | |
| (5,2,1,0.05) | 3.64 | 0.44 | 15.00 | 0.00 | 172.04 | 2287.32 | |
| (5,1,2,0.05) | 3.96 | 0.52 | 15.00 | 0.00 | 174.74 | 2178.47 | |
| (5,2,2,0.025) | 3.70 | 0.44 | 15.00 | 0.00 | 172.83 | 2200.20 | |
| (5,2,2,0.1) | 3.77 | 0.44 | 15.00 | 0.00 | 174.76 | 2284.80 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Bayesian cumulative shrinkage for infinite factorizations
Sirio Legramanti111Department of Decision Sciences, Bocconi University, 20136 Milan, Italy, [email protected], [email protected]
Daniele Durante††footnotemark:
David B. Dunson222Department of Statistical Science, Duke University, Durham, NC 27708, U.S.A., [email protected]
Abstract
There is a wide variety of models in which the dimension of the parameter space is unknown. For example, in factor analysis the number of latent factors is typically not known and has to be inferred from the observed data. Although classical shrinkage priors are useful in these contexts, increasing shrinkage priors can provide a more effective option, which progressively penalizes expansions with growing complexity. In this article we propose a novel increasing shrinkage prior, named the cumulative shrinkage process, for the parameters controlling the dimension in over-complete formulations. Our construction has broad applicability, simple interpretation, and is based on a sequence of spike and slab distributions which assign increasing mass to the spike as model complexity grows. Using factor analysis as an illustrative example, we show that this formulation has theoretical and practical advantages over current competitors, including an improved ability to recover the model dimension. An adaptive Markov chain Monte Carlo algorithm is proposed, and the methods are evaluated in simulation studies and applied to personality traits data. Code is available at https://github.com/siriolegramanti/CUSP.
Some key words: Factor analysis; Increasing shrinkage; Multiplicative gamma process; Spike and slab; Stick-breaking
1 Introduction
There has been a considerable interest in shrinkage priors for high dimensional parameters (e.g., Ishwaran and Rao,, 2005; Carvalho et al.,, 2010) but most of the focus has been on regression, where there is no natural ordering in the coefficients. There are several settings, however, where an order is present and desirable. Indeed, in statistical models relying on low-rank factorizations or basis expansions, such as factor models and tensor factorizations, it is natural to expect that additional dimensions play a progressively less important role in characterizing the data or model structure, and hence the associated parameters should have a stochastically decreasing effect. Such a behavior can be induced through increasing shrinkage priors. For instance, in the context of Bayesian factor models an example of this approach can be found in the multiplicative gamma process developed by Bhattacharya and Dunson, (2011) to penalize the effect of additional factor loadings via a cumulative product of gamma priors for their precision. Although this prior has been widely applied, there are practical disadvantages that motivate consideration of alternative solutions (Durante,, 2017). In general, despite the importance of increasing shrinkage priors in many factorization models, the methods, theory and computational strategies for these priors remain under-developed.
Motivated by the above considerations, we propose a novel increasing shrinkage prior, the cumulative shrinkage process, which is broadly applicable, while having simple and parsimonious structure. The proposed prior induces increasing shrinkage via a sequence of spike and slab distributions assigning growing mass to the spike as model complexity grows. In Definition 1, we present this prior for the general case in which the effect of the th dimension is controlled by a scalar parameter , so that redundant terms can be essentially deleted by progressively shrinking the sequence towards an appropriate value . For example, in factor models may denote the variance of the loadings for the th factor, and the goal is to define a prior on these terms which favors stochastically decreasing impact of the factors via increasing concentration of the loadings near zero as grows.
Definition 1
Let denote a countable sequence of parameters. We say that is distributed according to a cumulative shrinkage process with parameter , starting slab distribution and target value if, conditionally on , each is independent and has the following spike and slab distribution:
[TABLE]
where are independent variables and is a diffuse continuous distribution.
Equation (1) exploits the stick-breaking construction of the Dirichlet process (Ishwaran and James,, 2001). This implies that the probability assigned to the spike increases with the model dimension , and that almost surely. Hence, as complexity grows, increasingly concentrates around , which is specified to facilitate the deletion of redundant terms, while the slab corresponds to the prior on the active parameters. Definition 1 can be extended to sequences in , and can be replaced with a continuous distribution, without affecting the key properties of the prior, which are presented in § 2. As we will discuss in § 2 and in § 3.1, it is also possible to restrict Definition 1 to finitely many terms by letting . In practical implementations, this truncated version typically ensures full flexibility if is set to a conservative upper bound, but this value can be extremely large in several high dimensional settings, thus motivating our initial focus on the infinite expansion and its theoretical properties.
2 General properties of the cumulative shrinkage process
We first motivate our cumulative stick-breaking construction for the sequence that controls the mass assigned to the spike in (1) as a function of model dimension. Indeed, one could alternatively consider pre-specified non-decreasing functions bounded between [math] and . However, we have found that such specifications are overly-restrictive and have worse practical performance. The specification in (1) is purposely chosen to be effectively nonparametric, with Proposition 1 showing that the prior has large support on the space of non-decreasing sequences taking values in . See the Appendix for proofs.
Proposition 1
Let be the probability measure induced on by (1), then has large support on the whole space of non-decreasing sequences taking values in .
Besides being fully flexible, our construction for also has simple interpretation and allows control over shrinkage via an interpretable parameter , as stated in Proposition 2 and in the subsequent results.
Proposition 2
Each in (1) coincides with the proportion of the total variation distance between the slab and the spike covered up to step , in the sense that for every .
Using similar arguments, we can obtain analogous expressions for and , which represent the proportions of the total and the remaining , respectively, covered between steps and . Specifically, and for every . The expectations of these quantities are explicitly available as
[TABLE]
Moreover, combining (2) with Definition 1, the expectation of is
[TABLE]
where defines the expected value under the slab . Hence, as grows, the prior expectation of converges exponentially towards the spike location . As stated in Lemma 1, a stronger notion of cumulative shrinkage in distribution, beyond simple concentration in expectation, also holds under (1).
Lemma 1
Let denote an -neighborhood around with radius , and define with the complement of . Then, for any and ,
[TABLE]
Therefore, for any , and .
Equations (2)–(4) highlight how the rate of increasing shrinkage is controlled by . In particular, lower values of induce faster concentration around and hence more rapid shrinkage of the redundant terms. This control over the rate of increasing shrinkage via is separated from the specification of the slab , thereby allowing flexible modelling of the active terms. As discussed in Durante, (2017), such a separation does not hold, for example, in the multiplicative gamma process (Bhattacharya and Dunson,, 2011) whose hyper-parameters control both the rate of shrinkage and the prior for the active factors. This creates a trade-off between the need to maintain diffuse priors for the active terms and the attempt to shrink the redundant ones. Moreover, increasing shrinkage holds only in expectation and for specific hyper-parameters.
Instead, our prior ensures increasing shrinkage in distribution for any , and can model any prior expectation on the number of active terms. In fact, is equal to the prior mean of the number of terms in modelled via the slab . This result follows after noticing that in (1) can be alternatively obtained by marginalizing out the augmented indicator in . According to this result, counts the number of active elements in , and its prior mean is
[TABLE]
Hence, should be set to the expected number of active terms, while should be sufficiently diffuse to model active components, and should be chosen to facilitate the deletion of redundant ones.
Recalling Bhattacharya and Dunson, (2011) and Rousseau and Mengersen, (2011), it is useful to define models with more than enough components and then choose shrinkage priors which favor effective deletion of the unnecessary ones. This choice protects against over-fitting and allows estimation of model dimension, bypassing the need for reversible jump (Lopes and West,, 2004) or other computationally intensive strategies. Our cumulative shrinkage process in (1) provides a useful prior for this purpose. As discussed in § 1, it is straightforward to modify Definition 1 to instead restrict to components, by letting , with a conservative upper bound. Theorem 1 provides theoretical support for such a truncated representation.
Theorem 1
If has prior (1) and denotes the sequence obtained by fixing in for every , then for any truncation index and ,
[TABLE]
where is the sup-norm distance and is the complement of .
Hence, the prior probability of being close to converges to one at a rate which is exponential in , thus justifying posterior inference under finite sequences based on a conservative . Although the above bound holds for , in general is set close to zero. Hence, Theorem 1 is valid also for small .
3 Cumulative shrinkage process for Gaussian factor models
3.1 Model formulation and prior specification
Definition 1 provides a general prior which can be used in different models (e.g., Gopalan et al.,, 2014) under appropriate choices of and . Here, we focus on Gaussian sparse factor models as an important special case to illustrate our approach. We will compare primarily to the multiplicative gamma process, which has been devised specifically for this class of models and was shown to have practical gains in this context relative to several competitors, including the use of lasso (Tibshirani,, 1996), elastic-net (Zou and Hastie,, 2005) and banding approaches (Bickel and Levina,, 2008). Although there are other priors for sparse factor models (e.g., Carvalho et al.,, 2008; Knowles and Ghahramani,, 2011), these choices have practical disadvantages relative to the multiplicative gamma process, so they will not be considered further here.
The focus will be on performance in learning the structure of the covariance matrix for the data generated from the Gaussian factor model , with , , and . To perform Bayesian inference for this model, Bhattacharya and Dunson, (2011) assumed , and with scales from independent priors and global precisions having multiplicative gamma process prior
[TABLE]
Specific choices of in (5) ensure that decreases with , thus allowing increasing shrinkage of the loadings as grows. Instead, we keep , but let and place our cumulative shrinkage process prior on by assuming
[TABLE]
where are independent . Integrating out , each loading has the marginal prior , where denotes the Student- distribution with degrees of freedom, location [math] and scale . Hence, should be set close to zero to allow effective shrinkage of redundant factors, while should be specified so as to induce a moderately diffuse prior with scale for the active loadings. Although the choice is possible, we follow Ishwaran and Rao, (2005) by suggesting to induce a continuous shrinkage prior on every which improves mixing and identification of the inactive factors. Exploiting the marginals for , it also follows that, if then for each , and . This allows cumulative shrinkage in distribution also for the loadings, and provides guidelines on and . Additional discussion on prior elicitation and empirical studies on sensitivity can be found in § 4.
To implement the analysis, we require a truncation on the number of factors needed to characterize , as discussed in § 2. Theorem 2 states that our shrinkage process truncated at terms induces a well-defined prior for with full-support, under the sufficient conditions that is greater than the true , and . These conditions are met when considering up to active factors, with and .
Theorem 2
Let be any covariance matrix and define with the prior probability measure on covariance matrices induced by a Bayesian factor model having prior (6) on , truncated at with . If , then . In addition, if there exists a decomposition , such that and , then for any , where is an -neighborhood of under the sup-norm.
Recalling Theorem 2 in Bhattacharya and Dunson, (2011), this result is also sufficient to ensure that the posterior of is weakly consistent (Schwartz,, 1965).
3.2 Posterior computation via Gibbs sampling
Posterior inference for the factor model in § 3.1 with cumulative shrinkage process (6) truncated at terms for the loadings, proceeds via a Gibbs sampler cycling across the steps in Algorithm 1. This sampler relies on a data augmentation which exploits the fact that prior (6) can be obtained by marginalizing out the independent indicators with probabilities in
[TABLE]
where if and [math] otherwise. As is clear from Algorithm 1, conditioned on , it is possible to sample from conjugate full-conditionals, whereas the updating of the augmented data relies on the full-conditional distribution
[TABLE]
where and are the densities of -variate Gaussian and Student- distributions, respectively, evaluated at . Equations (10) are obtained by marginalizing out , distributed as in (7), from the joint . These calculations are straightforward in a variety of Bayesian models based on conditionally conjugate constructions, thus making (1) a general prior which can be easily incorporated, for instance, in Poisson factorizations (Gopalan et al.,, 2014).
3.3 Tuning the truncation index via adaptive Gibbs sampling
Recalling § 3.1, it is reasonable to perform Bayesian inference with at most factors. Under our cumulative shrinkage process truncated at terms this translates into , since there are at most active factors, with the th one modelled with the spike by construction. However, this choice is too conservative, since we expect substantially fewer active factors than , especially when is very large. Hence, running Algorithm 1 with would be computationally inefficient, since most of the columns in would be modelled by the spike, thus providing a negligible contribution to the factorization of .
Bhattacharya and Dunson, (2011) addressed this issue via an adaptive Gibbs sampler which tunes as the sampler proceeds. To satisfy the diminishing adaptation condition in Roberts and Rosenthal, (2007), they adapt at the iteration with probability , where and . This adaptation consists in dropping the inactive columns of , if any, together with the corresponding parameters. If instead all columns are active, an extra factor is added, sampling the associated parameters from the prior.
This idea can be also implemented for the cumulative shrinkage process, as illustrated in Algorithm 2. Under our prior, the inactive columns are naturally identified as those modelled by the spike and, hence, have index such that . Under the multiplicative gamma process, instead, a column is flagged as inactive if all its entries are within distance from zero. This plays a similar role as our spike location . Indeed, lower values of and make it harder to discard inactive columns, thus affecting running time. Hence, although fixing close to zero is a key to enforce shrinkage, excessively low values should be avoided. Since under a truncated cumulative shrinkage process the number of active factors is at most , we increase by one when , and we decrease to when .
In our implementation no adaptation is allowed before a fixed number of iterations to let the chain stabilize, while and are initialized to and , which is the maximum possible rank for . Further guidance for the choice of can be obtained by monitoring how close is to 1, via (2).
4 Performance assessments of Gaussian factor models in simulations
We consider illustrative simulations to assess performance in learning the structure of the true covariance matrix for the data from a Gaussian factor model, with and the entries in drawn from independent . To study performance at varying dimensions, we consider three different combinations of : , and . For every pair , we sample 25 datasets of observations from and, for each of the 25 replicates, we perform posterior inference on via the Gaussian factor model in § 3.1 under both prior (5) and (6), exploiting the adaptive Gibbs sampler in Bhattacharya and Dunson, (2011) and Algorithm 2, respectively.
For our cumulative shrinkage process, we set , and , whereas for the multiplicative gamma process, we follow Durante, (2017) by considering , and set as done by Bhattacharya and Dunson, (2011) in their simulations. For both models, are fixed at as in Bhattacharya and Dunson, (2011). The truncation is initialized at for the multiplicative gamma process and at for the cumulative shrinkage process, both corresponding to at most active factors. For the two methods, adaptation is allowed only after iterations and, following Bhattacharya and Dunson, (2011), the parameters are set to , while the adaptation threshold in the multiplicative gamma process is . Both algorithms are run for iterations after a burn-in of and, by thinning every 5, we obtain a final sample of draws from the posterior of . For each of the simulations in every scenario, we compute a Monte Carlo estimate of and . Since , the posterior averaged mean square error accounts for both bias and variance in the posterior of .
Table 1 shows, for each scenario and model, the median and the interquartile range of the above quantities computed from the measures produced by the different simulations, together with the medians of the averaged effective sample sizes, out of samples, and of the running times. Such quantities rely on an R implementation run on an Intel Core i7-3632QM CPU laptop with GB of RAM. The two methods have comparable mean square errors, but these measures and the performance gains of prior (6) over (5) increase with . Our approach also provides some improvements in mixing and reduced running times. The latter is arguably due to the fact that the multiplicative gamma process overestimates , hence keeping more parameters to update than necessary. Instead, our cumulative shrinkage process recovers the true dimension in all settings, thus efficiently tuning the truncation level . Such an improved learning of the true underlying dimension is confirmed by the credible intervals highly concentrated around in all the scenarios considered. The multiplicative gamma process leads instead to wider credible intervals for , with none of them including . As shown in Table 2, results are robust to moderate and reasonable changes in the hyper-parameters of the cumulative shrinkage process. We also tried to modify in Bhattacharya and Dunson, (2011) so as to delete columns with values on the same scale of our spike. This setting provided lower estimates for and, hence, a computational time more similar to our cumulative shrinkage process, but led to worse mean square errors and still some difficulties in learning .
5 Application of Gaussian factor models to personality data
We conclude with an application to a subset of the personality data available in the dataset bfi from the R package psych. Here, we focus on the association structure among personality self-report items collected on a 6 point response scale for individuals older than years. These variables represent answers to questions organized into five personality traits known as agreeableness, conscientiousness, extraversion, neuroticism, and openness. Recalling common implementations of factor models, we center the 25 items, and then replace variables and with their negative version as suggested in the R documentation of the bfi dataset to have coherent answers within each personality trait. Posterior inference under priors (5)–(6) is performed with the same hyper-parameters and Gibbs settings as in § 4.
Figure 1 shows posterior means and credible intervals for the absolute value of the entries in the correlation matrix , under our model. Samples from are obtained computing for every sample of , with denoting the element-wise Hadamard product. Figure 1 highlights associations within each block of five answers measuring a main personality trait, while showing also interesting across-blocks correlations among agreeableness and extraversion as well as conscientiousness and neuroticism. Openness has less evident within-block and across-block associations. These results suggest three main factors as confirmed by the posterior mean and by the credible intervals for under the cumulative shrinkage process, which are and , respectively. Such posterior summaries are and under the multiplicative gamma process, but the higher does not lead to improved learning of . In fact, when considering the Monte Carlo estimate of the mean squared deviations from the sample correlation matrix , we obtain under both (6) and (5), suggesting that the multiplicative gamma process might overestimate in this application. This leads to more redundant parameters to be updated in the adaptive Gibbs sampler, thus increasing the computational time from to seconds. Our approach also increases the averaged effective sample size from to .
Acknowledgement
The authors are grateful to the Editor, the Associate Editor and the referees for the useful suggestions, and acknowledge the support from miur (prin 2017 grant) as well as the United States Office of Naval Research and National Institutes of Health in the preparation of the final version of this article.
Appendix
Proof of Proposition 1. Since the mapping from the sequence to is one-to-one, it is sufficient to ensure that the stick-breaking prior for has full support on the infinite dimensional simplex. This result is proved by Bissiri and Ongaro, (2014) in § 3.2.
Proof of Proposition 2. The proof of Proposition 2 adapts the one of Theorem 1 in Canale et al., (2018). In fact, under the prior in Definition 1, the distance on the Borel -algebra in is equal to
[TABLE]
Hence , completing the proof.
Proof of Lemma 1. Notice that, for each , can be equivalently expressed as
[TABLE]
Therefore, replacing with its expression in equation (2) leads to (4). To prove that it is sufficient to note that .
Proof of Theorem 1. The proof follows after noting that , and that for any . Hence, adapting the proof of Lemma 1, we obtain
[TABLE]
To conclude the proof, notice that .
Proof of Theorem 2. Let us first prove that for the Gaussian factor model in § 31, with prior (6) truncated at terms, we have . Since, by construction, is diagonal with almost surely finite and non-negative entries, and is trivially positive semi-definite, we only need to ensure that each entry in is almost surely finite. By the Cauchy-Schwartz inequality we obtain
[TABLE]
Under the factor model in § having prior (6) truncated at terms, we have that
[TABLE]
for every , including the index of the maximum, thus ensuring that each entry in is almost surely finite under the sufficient condition that . This holds when and .
Let us now prove the full support for . Since , there always exists a and a positive diagonal matrix such that . For instance, one can let and . Hence, it suffices to prove full support for the priors induced on and by the truncated version of our cumulative shrinkage process. Such a property easily holds for , whose diagonal elements have independent inverse-gamma priors. Moreover, adapting the proof of Proposition 2 in Bhattacharya and Dunson, (2011), full support can be proved also for the prior induced on . Indeed, recalling § , we have that with
[TABLE]
In fact, conditioned on , each has independent distribution.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bhattacharya and Dunson, (2011) Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika , 98:291–306.
- 2Bickel and Levina, (2008) Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. , 36(1):199–227.
- 3Bissiri and Ongaro, (2014) Bissiri, P. G. and Ongaro, A. (2014). On the topological support of species sampling priors. Electron. J. Statist. , 8(1):861–882.
- 4Canale et al., (2018) Canale, A., Durante, D., and Dunson, D. B. (2018). Convex mixture regression for quantitative risk assessment. Biometrics , 74:1331–1340.
- 5Carvalho et al., (2008) Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q., and West, M. (2008). High-dimensional sparse factor modeling: applications in gene expression genomics. J. Am. Statist. Assoc. , 103:1438–1456.
- 6Carvalho et al., (2010) Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika , 97:465–480.
- 7Durante, (2017) Durante, D. (2017). A note on the multiplicative gamma process. Statist. Probabil. Lett. , 122:198–204.
- 8Gopalan et al., (2014) Gopalan, P., Ruiz, F. J., Ranganath, R., and Blei, D. (2014). Bayesian nonparametric Poisson factorization for recommendation systems. J. Mach. Learn. Res. W&CP , 33:275–283.
