Sampler for Composition Ratio by Markov Chain Monte Carlo
Yachiko Obara, Tetsuro Morimura, Hiroki Yanagisawa

TL;DR
This paper introduces a novel MCMC algorithm tailored for generating composition ratios with fixed sum and sparsity constraints, facilitating creative combination of knowledge in tasks like cocktail creation.
Contribution
It proposes a new MCMC method that effectively samples composition ratios with specific constraints, addressing limitations of existing algorithms.
Findings
Successfully generated composition ratios for cocktail creation.
Combined MCMC with supervised learning for creative problem solving.
Demonstrated effectiveness in a practical creative task.
Abstract
Invention involves combination, or more precisely, ratios of composition. According to Thomas Edison, "Genius is one percent inspiration and 99 percent perspiration" is an example. In many situations, researchers and inventors already have a variety of data and manage to create something new by using it, but the key problem is how to select and combine knowledge. In this paper, we propose a new Markov chain Monte Carlo (MCMC) algorithm to generate composition ratios, nonnegative-integer-valued vectors with two properties: (i) the sum of the elements of each vector is constant, and (ii) only a small number of elements is nonzero. These constraints make it difficult for existing MCMC algorithms to sample composition ratios. The key points of our approach are (1) designing an appropriate target distribution by using a condition on the number of nonzero elements, and (2) changing values…
| The number of combinations | |
|---|---|
| 1 | 50 |
| 2 | |
| 3 | |
| 4 | |
| 5 |
| Target distribution | Naive MCMC | Gibbs sampling | Algorithm 2 |
|---|---|---|---|
| accepted rates[] | accepted rates[] | accepted rates[] | |
| (a) Uniform | 70.40 0.430 | 100.00 0.000 | 95.58 0.008 |
| (b) Unimodal | 85.91 0.437 | 100.00 0.000 | 95.85 0.001 |
| Target distribution | Naive MCMC | Gibbs sampling | Algorithm 2 |
|---|---|---|---|
| updated rates[] | updated rates[] | updated rates[] | |
| (a) Uniform | 70.32 0.418 | 8.79 0.001 | 53.53 0.005 |
| (b) Unimodal | 85.90 0.431 | 4.96 0.002 | 57.37 0.001 |
| Target distribution | Naive MCMC | Gibbs sampling | Algorithm 2 |
|---|---|---|---|
| accepted rates[] | accepted rates[] | accepted rates[] | |
| (a) Uniform | 0.00 0.000 | 100.00 0.000 | 99.60 0.060 |
| (b) Unimodal | 0.00 0.000 | 100.00 0.000 | 99.59 0.012 |
| (c) Bimodal | 0.00 0.000 | 100.00 0.000 | 99.59 0.010 |
| (d) Exponential | 0.00 0.000 | 100.00 0.000 | 99.97 0.0002 |
| Target distribution | Naive MCMC | Gibbs sampling | Algorithm 2 |
|---|---|---|---|
| updated rates[] | updated rates[] | updated rates[] | |
| (a) Uniform | 0.00 0.000 | 0.43 0.016 | 50.31 0.051 |
| (b) Unimodal | 0.00 0.000 | 0.10 0.070 | 50.32 0.081 |
| (c) Bimodal | 0.00 0.000 | 0.14 0.089 | 50.25 0.055 |
| (d) Exponential | 0.00 0.000 | 0.03 0.003 | 50.07 0.010 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Bayesian Methods and Mixture Models · Statistical Methods and Inference
Sampler for Composition Ratio by Markov Chain Monte Carlo
Yachiko Obara
Tetsuro Morimura
Hiroki Yanagisawa IBM Research - Tokyo
[email protected], {tetsuro, yanagis}@jp.ibm.com
Abstract
Invention involves combination, or more precisely, ratios of composition. According to Thomas Edison, “Genius is one percent inspiration and 99 percent perspiration” is an example. In many situations, researchers and inventors already have a variety of data and manage to create something new by using it, but the key problem is how to select and combine knowledge. In this paper, we propose a new Markov chain Monte Carlo (MCMC) algorithm to generate composition ratios, nonnegative-integer-valued vectors with two properties: (i) the sum of the elements of each vector is constant, and (ii) only a small number of elements is nonzero. These constraints make it difficult for existing MCMC algorithms to sample composition ratios. The key points of our approach are (1) designing an appropriate target distribution by using a condition on the number of nonzero elements, and (2) changing values only between a certain pair of elements in each iteration. Through an experiment on creating a new cocktail, we show that the combination of the proposed method with supervised learning can solve a creative problem.
1 Introduction
The cocktail kir royal is a mixture of 90% champagne and 10% crème de cassis. Similarly, many elements of daily life can be regarded as mixtures, in which the amount of each component can be calculated as a ratio in relation to the whole: schedules, household expenses, investment portfolios, foods, drinks, cocktails, medicines, toiletries, cosmetics, fragrances, documents, if we do not consider the order. In this paper, a composition ratio means a set of such ratios in relation to the whole of a mixture’s components. We denote a composition ratio as a nonnegative vector . The cocktail kir royal, for example, would be denoted as , where the first element of is champagne, the second is crème de cassis, and the other elements represent other available ingredients. Composition ratios have two key characteristics. First, they are often sparse, containing a small number of components among a wide range of choices. Second, they have various desired properties. A fragrance, for example, is created by selecting dozens of ingredients from thousands of ingredients. The proportion of each selected ingredients is set under the condition that the total mass is g, for example a fragrance composed of g of “ingredient A” and g of “ingredient B”. A fragrance can have desired properties related to aromatics (e.g., the type of smell), popularity (e.g., frequent patterns of ingredient combinations, or combinations that should be avoided), and appropriateness for certain use cases (e.g., combinations for perfumes, shampoos, or hand soaps). Perfumers who create new fragrances seek to develop various fragrances with desired properties. It is also possible that perfumers are willing to accept certain fragrances lacking some desired properties, because they can still draw inspiration from such fragrances. Thus, it is interesting to consider approaches for generating many fragrances in proportion to how well they satisfy desired conditions.
To solve such composition ratio problems, we propose a new Markov chain Monte Carlo (MCMC) algorithm and use MCMC samples themselves as solutions. MCMC is a strategy for generating samples that follows samples drawn from a target distribution . The strategy works by using a Markov chain to spend more time in a state that has a higher probability . MCMC techniques play fundamental roles in machine learning, physics, statistics, econometrics, and decision analysis Andrieu et al. (2003) and are often applied to solve integration problems in high-dimensional spaces for cases such as normalization, marginalization, and expectation. MCMC samples can also be used to obtain the maximum of an objective function, but this is inefficient as compared with other methods, including simulated annealing and gradient descent. Therefore, it is rare to use MCMC samples for the optimization problem, but we purposely use this approach because of the nature of our problem.
The general approach to generating composition ratios having several desired properties can be seen as an optimization problem. The Markowitz standard model is a well-known model for optimizing investment portfolios Markowitz (1952). Selecting sparse portfolios (i.e., portfolios with only a few active positions) is significant, because they allow accounting for transaction costs Brodie et al. (2009). There are many improved approaches based on the Markowitz standard model to handle more complex conditions and a wider range of choices, such as a neural-network-based model Fernández and Gómez (2007) and a particle swarm optimization model Cura (2009). Our proposed MCMC method does not overlap those methods but can be combined with them by appropriately designing the target distribution. In the same manner, we can combine our method with generative models, such as latent Dirichlet allocation (LDA) for handling documents as composition ratios Blei et al. (2003).
In our research, we focus on two aspects of the problem: creativity and resource allocation. For example, fragrance development emphasizes creativity, while investment portfolio selection emphasizes resource allocation. In this paper we address the former creative problem. Through an experiment on creating a new cocktail, we show that our method can solve such creative problems through combination with supervised learning.
The contributions of this paper are the following:
- •
We propose an efficient MCMC algorithm to generate composition ratios with a small number of components chosen from a wide range of choices.
- •
We report empirical evidence that the combination of our method with supervised learning can solve a creative problem.
2 Problem Formulation
In this section we formulate our problem. First, we define the composition ratio. In this paper, we consider discretized composition ratios. Specifically, a composition ratio is a nonnegative integer vector having dimensions and the property that the sum of the elements of is equal to an integer . Thus, the set of composition ratios is
[TABLE]
A vector can be modeled as having balls in bins. The number of bins containing at least one ball is denoted as . In the case of a cocktail, for example, denotes the composition ratio of the cocktail, denotes the number of all available ingredients, denotes the amount of the -th ingredient, and denotes the number of ingredients used in the cocktail. When equals 100, each can be regarded as the percentage of the -th ingredient in relation to the whole.
We consider the problem of generating random samples { from a given probability distribution , where and is a set of conditions, which is given by user according to the application. We define
[TABLE]
where is a scalar-valued function called the energy function. Equation (2) is a common way to define a valid probability (because the value is always positive and the sum of all elements is one) LeCun et al. (2006). We can define indirectly by defining the energy function . We also write the probability distribution . Note that .
In this paper, we handle two types of conditions and . The condition describes the sparsity condition. The other describe conditions of the other desired properties. For readability, we denote as , and as from here. First, we define and its energy function. Let be a nonnegative-valued function of , which outputs the goodness of fit of with respect to the targeted property. The corresponding energy function is naturally defined as
[TABLE]
where is a hyper-parameter controlling the priority of the condition . In the case of a cocktail, for example, a condition might demand that the taste of should be “Fresh”. As the taste of is closer to “Fresh”, outputs a higher value.
As a second type of condition, we consider the conditions on the sparsity of the desired samples. We define the sparsity condition as the parameter of the categorical distribution, . The sparsity condition requires that satisfy
[TABLE]
where is an indicator function, and denotes -norm of (i.e., the number of nonzero elements of ). We derive the energy function by solving equation (4) in Section 4.1.
Before proceeding, we show examples of a sparsity condition . In the case of creating cocktails, for example, indicates how many ingredients are likely to be used in a cocktail. There are several possible candidates for . It can be a unimodal distribution if there is a rough desirable number of ingredients used in a cocktail. In another case, it can be a bimodal distribution when we aim to simultaneously generate simple cocktails (smaller number of ingredients) and complex cocktails (higher number of ingredients). Also, when a smaller number of ingredients is acceptable, we can use a distribution in which the probability exponentially decays as the number of ingredients increases.
3 MCMC Algorithms
An MCMC sampler is a standard algorithm that uses a Markov chain to generate samples from a target probability distribution . In Section 3.1 we start by reviewing Markov chains and MCMC methods. Section 3.2 explains the Metropolis-Hastings algorithm Hastings (1970) Metropolis et al. (1953), a notable MCMC algorithm.
3.1 Markov Chain and MCMC
We consider a Markov chain, in which the future state depends only on the current state , which is a real-valued vector having dimensions. Let be the transition probability of the Markov chain from to . It is known that, for any initial state, a Markov chain converges to an invariant distribution as long as the transition probability satisifes the detailed balance condition, as follows.
Definition**.**
The detailed balance condition is a sufficient condition to ensure that a distribution is the invariant distribution to which a Markov chain converges, when the chain is ergodic. The condition is met when any and satisfy the following condition:
[TABLE]
Let be a target distribution. Then, an MCMC sampler is a Markov chain that has an invariant distribution equal to the target distribution . It is common in MCMC algorithms to design the transition probability to ensure that the detailed balance condition is satisfied. Note that each sample generated from an MCMC sampler can be seen as a visited state at each time step of the Markov chain, so we use the words “sample” and “state” interchangeably throughout this paper. To solve the problem formulated in the previous section: generating random samples { from a given probability distribution , we use an MCMC algorithm. Instead of sampling each from scratch, we sample according to the previous sample . Note that an MCMC algorithm assumes that the initial sample is given. We sample from based on the transition probability and we repeat this process until we get random samples.
3.2 Metropolis-Hastings Algorithm
The Metropolis-Hastings algorithm is one of the most popular MCMC algorithms, and Algorithm 1 lists its pseudocode. We denote the continuous uniform distribution of which minimum value is zero and maximum value is one as . The Metropolis-Hastings algorithm requires the proposal distribution that the user should design in advance. In this algorithm a candidate for the next sample is drawn from a proposal distribution , which is a probability distribution given the current sample . Depending on the acceptance rate below, the candidate is accepted and used as the next sample.
[TABLE]
When the candidate is rejected, the current sample is used as the next sample. In other words, the Markov chain moves from the current state to the next state with acceptance rate , and otherwise, it remains at . The transition probability is calculated as . The detailed balance condition is known to be satisfied when the acceptance rate is used Chib and Greenberg (1995). One drawback of the Metropolis-Hastings algorithm is that, if the acceptance rate is much smaller than one, it rejects many samples, the mixing time to invariant distribution is very high, and thus the computation time will be unacceptable. It is important to design an appropriate proposal distribution so that the acceptance rate is close to one.
4 MCMC Algorithm for Composition Ratios
In this section we propose our MCMC algorithm for composition ratios. We start in Section 4.1 by describing how to handle the sparsity condition . Section 4.2 shows that a naive MCMC algorithm based on the Metropolis-Hastings algorithm has a small acceptance rate, and we thus propose a new algorithm to improve the acceptance rate in Section 4.3. In this section, for simplicity, we sometimes omit from the target distribution . Specifically, is sometimes denoted as .
4.1 Energy Function of Sparsity Condition
To handle the sparsity condition , we derive the energy function by solving equation (4). As the sparsity condition relates only to the number of nonzero elements , an having the same number of nonzero elements has the same probability , so equation (4) can be rewritten as the following:
[TABLE]
where the denominator is the number of that satisfy . From equation (6) and the definition of (cf. equation (2)), the energy function can be derived as
[TABLE]
where is the hyperparameter controlling the priority of the condition . In this paper we set . The denominator of equation (7) can be computed as
[TABLE]
The first term on the right-hand side of equation (8) corresponds to the number of combinations for choosing bins from bins. The second term corresponds to the number of combinations that put a ball in each of the chosen bins and choose bins for allocating the remaining balls from bins. There are more combinations in the case in which the balls are distributed in many bins, so with the balls allocated randomly, tends to become larger, as listed in Table 1. As described in the next section, our algorithm avoids directly calculating the number of combinations in equation (8).
4.2 Naive MCMC Algorithm
As a naive MCMC algorithm, we first describe the case of using the Metropolis-Hastings algorithm to choose a candidate for the next sample from a set of composition ratios uniformly at random. This is achieved by setting the proposal distribution to the following:
[TABLE]
where is the number of elements of . By equation (5), the acceptance rate is set to . For simplicity, we assume that . Here, the acceptance rate tends to be small, because non-sparse samples having low probability are drawn more often than sparse samples having high probability (see Table 1). Designing the proposal distribution to achieve a high acceptance rate under the sparsity condition is a nontrivial problem. Hence, the next subsection describes how to improve the acceptance rate.
4.3 Accelerated MCMC Algorithm for Composition Ratios
To increase acceptance rate, we propose a new MCMC algorithm. Algorithm 2 lists the pseudocode for this algorithm. We denote the discrete uniform distribution on integers of which minimum value is one and maximum value is as . Instead of sampling each from the proposal distribution given in equation (9), we sample according to the previous sample . In each iteration we change values of only two elements, and , of the previous sample , meeting the condition of equation (1). We introduce a constraint on the probability for choosing the pair of elements to be changed in each iteration, namely, that at least one of the elements is nonzero. This constraint avoids the case in which both and are zeros, causing to remain . Such cases often occur because is sparse, and this is why we introduce the above constraint. We denote the probability for choosing the pair of elements to be changed as . The constraint is described that we set for most pairs of and . Furthermore, instead of changing the values of and uniformly at random, we change them based on the probability which is proportional to conditions of the target distribution. Hence, we define the proposal distribution as
[TABLE]
where denotes but with and omitted. The idea of using conditions of the target distribution as the proposal distribution is known as Gibbs sampling Geman and Geman (1984). In Gibbs sampling is a constant (not depends on ), and so the acceptance rate is one. We introduce the constraint that depends on , and we can offset the effect of the constraint by adjusting the MCMC acceptance rate in equation (5). Thus, we use the following adjusted acceptance rate:
[TABLE]
This is derived by using , where is the marginal distribution of marginalized with respect to and . The proposed algorithm satisfies the detailed balance condition, which can be shown in the same manner as for the Metropolis-Hastings algorithm.
Here, we consider equation (11) in detail to show that the acceptance rate can be small. In Algorithm 2 we set as the following:
[TABLE]
In each iteration, balls are reallocated into and . Therefore, the number of nonzero elements, , of sample satisfies , where is the number of nonzero elements of the previous sample :
[TABLE]
Therefore, when the number of nonzero elements decreases, the acceptance rate can be small. We also confirm that the acceptance rate is always greater than 1/2 from equation (12).
Next, we explain how the algorithm avoids calculating the number of combinations in equation (8). Instead of calculating directly, we calculate the ratio of and , i.e., the right-hand side of the following equation:
[TABLE]
Notice that we have
[TABLE]
Thus, we can avoid calculating the large number of combinations.
5 Experimental Results
This section describes the results of two experiments. In the first experiment (Section 5.1), we verified that samples drawn by the proposed MCMC algorithm satisfied a sparsity condition and converged to a target distribution. In the second experiment (Section 5.2), we attempted to use the algorithm in a creative work by creating a new cocktail. We discarded iterations at the beginning of each MCMC run, which is a common practice to find a good starting point for MCMC methods.
5.1 Control of Sparsity
We evaluated the proposed algorithm on the task of satisfying a sparse condition . We conducted two experiments: one used small and small (a small ingredients set and a rough division), and another used large and large (a large ingredients set and a fine division). We also evaluated the naive MCMC algorithm and Gibbs sampling. Here Gibbs sampling is the same as Algorithm 2 except is constant (not depends on ), so it satisfies the condition that the sum of the elements of each sample is constant.
5.1.1 Small Ingredients Set and Rough Division
We set , , and . We conducted experiments on two target distributions: (a) a uniform distribution and (b) a unimodal distribution . We ran iterations. For each target distribution, we sampled 10 times while changing the initial sample, which was generated randomly from .
The both MCMC samples of the naive MCMC algorithm and Gibbs sampling did not match the target distributions while Algorithm 2 converged. Table 2 lists the accepted rates of proposed MCMC samples with the naive MCMC algorithm, Gibbs sampling, and Algorithm 2. We calculated how often the new sample was different from the current sample and listed them as updated rates in Table 3. Although the accepted rate of Gibbs sampling is one, it is often the case in which both and are zero, causing a new sample to remain the previous sample . Such cases often occur because is sparse. Regarding the naive MCMC algorithm, it worked well on sampling various samples because both of the accepted rate and the updated rate are more than 70. However those samples did not match the target distributions.
5.1.2 Large Ingredients Set and Fine Division
We set , , and . We conducted experiments on four target distributions: (a) a uniform distribution , (b) a unimodal distribution , (c) a bimodal distribution and (d) an exponential distribution . We ran iterations. For each target distribution, we sampled 10 times while changing the initial sample, which was generated randomly from .
Figure 1 shows histograms of the mean number of nonnegative elements from MCMC samples of 10 sequences. The histograms approximate the target distributions well. Table 4 and Table 5 list the accepted rates and the updated rates of proposed MCMC samples with the naive MCMC algorithm and Algorithm 2 respectively. Because was much larger than the expected , the naive MCMC algorithm could not propose appropriate candidates, and so the acceptance rates were zero. We thus confirmed that the accepted rates with Algorithm 2 were much larger than those with the naive MCMC algorithm. In the same manner as Gibbs sampling in the previous experiments, Gibbs sampling had small updated rates and did not match the target distributions.
5.2 Creation of New Cocktail
We also evaluated the proposed algorithm on the task of creating new cocktails. We used a cocktail dataset Andjelkovic (2018) containing 69 cocktails, with seven taste labels “Bittersweet,” “Boozy,” “Fresh,” “Salty,” “Sour,” “Sweet,” “Unknown”, and four timing labels “After dinner,” “All day,” “Long drink,” “Pre-dinner”. We aimed to generate cocktails (samples) labeled as “All day” and “Fresh.” The total number of ingredients, , was . We converted all ingredient unit amounts (e.g., centiliter, bar spoon, dash, splash) into permille values, giving . In the dataset, the number of ingredients used in one cocktail ranged from two to five, and we set accordingly: and . We also add the conditions and demanding that the taste be “Fresh” and timing be “All day”, respectively. We trained two multi-class classification models using a random forest: one for the taste labels, and the other for the timing labels. We then used the corresponding outputs as and . Note that the output of a random forest can be regarded as a probability Bostrom (2007). We set the hyperparameter to one and ran the proposed MCMC algorithm for iterations. The initial sample was generated randomly under the condition of .
Figure 2 shows recipes (i.e., composition ratios) for a new cocktail and the most similar cocktails in the dataset. To calculate similarities, we focused on combinations of ingredients and used the Szymkiewicz-Simpson coefficient Simpson (1960). All the most similar cocktails were labeled with “All day” and “Fresh,” but they had only one ingredient in common with the new cocktail. This result suggests that we should have the classification models focus on co-occurrence to get improved outcomes. The combination of champagne and orange juice used in the new cocktail did not appear in the dataset but is actually a popular cocktail called a mimosa. Our algorithm thus reinvented the combination of ingredients in a popular cocktail. Figure 3 shows the composition ratios of the new cocktail and the similar cocktails in the dataset. The composition ratio of champagne is smaller in the new cocktail than in the others. We attributed this to the constraint imposed by the composition ratios with , but this problem requires more investigation.
Figure 4 shows the frequencies of occurring for randomly generated samples under the condition of and samples generated by the proposed MCMC algorithm. Our MCMC algorithm more efficiently sampled cocktails having large than the random method did. In other words, the MCMC samples were more likely to have the “Fresh” and “All day” labels. Figure 4 also shows greater variety for the MCMC samples in terms of the measure of the target distribution.
6 Conclusion
We have proposed an MCMC algorithm that can generate composition ratio samples satisfying the sparsity and conditions of composition ratios. Our empirical results show that the proposed method converges to the target distribution with sparsity conditions, and the combination with supervised learning can solve a creative problem.
Acknowledgments
This research was supported by CREST, JST Grant Number JPMJCR1304, Japan.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Andjelkovic [2018] Stevan Andjelkovic. Cocktails dataset. https://github.com/stevana/cocktails , 2018.
- 2Andrieu et al. [2003] Christophe Andrieu, Nando D. Freitas, Arnaud Doucet, and Michael I. Jordan. An introduction to MCMC for machine learning. Machine learning , 50(1-2):5–43, 2003.
- 3Blei et al. [2003] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. Journal of machine Learning research , 3(Jan):993–1022, 2003.
- 4Bostrom [2007] Henrik Bostrom. Estimating class probabilities in random forests. In Sixth International Conference on Machine Learning and Applications (ICMLA 2007) , pages 211–216. IEEE, 2007.
- 5Brodie et al. [2009] Joshua Brodie, Ingrid Daubechies, Christine D. Mol, Domenico Giannone, and Ignace Loris. Sparse and stable Markowitz portfolios. Proceedings of the National Academy of Sciences , 106(30):12267–12272, 2009.
- 6Chib and Greenberg [1995] Siddhartha Chib and Edward Greenberg. Understanding the metropolis-hastings algorithm. The american statistician , 49(4):327–335, 1995.
- 7Cura [2009] Tunchan Cura. Particle swarm optimization approach to portfolio optimization. Nonlinear analysis: Real world applications , 10(4):2396–2406, 2009.
- 8Fernández and Gómez [2007] Alberto Fernández and Sergio Gómez. Portfolio selection using neural networks. Computers & Operations Research , 34(4):1177–1191, 2007.
