Sampler for Composition Ratio by Markov Chain Monte Carlo

Yachiko Obara; Tetsuro Morimura; Hiroki Yanagisawa

arXiv:1906.06663·stat.ML·July 1, 2019

Sampler for Composition Ratio by Markov Chain Monte Carlo

Yachiko Obara, Tetsuro Morimura, Hiroki Yanagisawa

PDF

Open Access

TL;DR

This paper introduces a novel MCMC algorithm tailored for generating composition ratios with fixed sum and sparsity constraints, facilitating creative combination of knowledge in tasks like cocktail creation.

Contribution

It proposes a new MCMC method that effectively samples composition ratios with specific constraints, addressing limitations of existing algorithms.

Findings

01

Successfully generated composition ratios for cocktail creation.

02

Combined MCMC with supervised learning for creative problem solving.

03

Demonstrated effectiveness in a practical creative task.

Abstract

Invention involves combination, or more precisely, ratios of composition. According to Thomas Edison, "Genius is one percent inspiration and 99 percent perspiration" is an example. In many situations, researchers and inventors already have a variety of data and manage to create something new by using it, but the key problem is how to select and combine knowledge. In this paper, we propose a new Markov chain Monte Carlo (MCMC) algorithm to generate composition ratios, nonnegative-integer-valued vectors with two properties: (i) the sum of the elements of each vector is constant, and (ii) only a small number of elements is nonzero. These constraints make it difficult for existing MCMC algorithms to sample composition ratios. The key points of our approach are (1) designing an appropriate target distribution by using a condition on the number of nonzero elements, and (2) changing values…

Tables5

Table 1. Table 1: The number of combinations when we have M = 5 𝑀 5 M=5 balls and N = 50 𝑁 50 N=50 bins.

$n$	The number of combinations
1	50
2	$4 900$
3	$117 600$
4	$921 200$
5	$2 118 760$

Table 2. Table 2: Accepted rates of proposed MCMC samples with the naive MCMC algorithm, Gibbs sampling, and Algorithm 2 .

Target distribution	Naive MCMC	Gibbs sampling	Algorithm 2
	accepted rates[ $%$ ]	accepted rates[ $%$ ]	accepted rates[ $%$ ]
(a) Uniform	70.40 $\pm$ 0.430	100.00 $\pm$ 0.000	95.58 $\pm$ 0.008
(b) Unimodal	85.91 $\pm$ 0.437	100.00 $\pm$ 0.000	95.85 $\pm$ 0.001

Table 3. Table 3: Updated rates of proposed MCMC samples with the naive MCMC algorithm, Gibbs sampling, and Algorithm 2 .

Target distribution	Naive MCMC	Gibbs sampling	Algorithm 2
	updated rates[ $%$ ]	updated rates[ $%$ ]	updated rates[ $%$ ]
(a) Uniform	70.32 $\pm$ 0.418	8.79 $\pm$ 0.001	53.53 $\pm$ 0.005
(b) Unimodal	85.90 $\pm$ 0.431	4.96 $\pm$ 0.002	57.37 $\pm$ 0.001

Table 4. Table 4: Accepted rates of proposed MCMC samples with the naive MCMC algorithm, Gibbs sampling, and Algorithm 2 .

Target distribution	Naive MCMC	Gibbs sampling	Algorithm 2
	accepted rates[ $%$ ]	accepted rates[ $%$ ]	accepted rates[ $%$ ]
(a) Uniform	0.00 $\pm$ 0.000	100.00 $\pm$ 0.000	99.60 $\pm$ 0.060
(b) Unimodal	0.00 $\pm$ 0.000	100.00 $\pm$ 0.000	99.59 $\pm$ 0.012
(c) Bimodal	0.00 $\pm$ 0.000	100.00 $\pm$ 0.000	99.59 $\pm$ 0.010
(d) Exponential	0.00 $\pm$ 0.000	100.00 $\pm$ 0.000	99.97 $\pm$ 0.0002

Table 5. Table 5: Updated rates of proposed MCMC samples with the naive MCMC algorithm, Gibbs sampling, and Algorithm 2 .

Target distribution	Naive MCMC	Gibbs sampling	Algorithm 2
	updated rates[ $%$ ]	updated rates[ $%$ ]	updated rates[ $%$ ]
(a) Uniform	0.00 $\pm$ 0.000	0.43 $\pm$ 0.016	50.31 $\pm$ 0.051
(b) Unimodal	0.00 $\pm$ 0.000	0.10 $\pm$ 0.070	50.32 $\pm$ 0.081
(c) Bimodal	0.00 $\pm$ 0.000	0.14 $\pm$ 0.089	50.25 $\pm$ 0.055
(d) Exponential	0.00 $\pm$ 0.000	0.03 $\pm$ 0.003	50.07 $\pm$ 0.010

Equations32

X ≜ {x \in N_{0}^{N} ∣ Σ_{i} x_{i} = M} .

X ≜ {x \in N_{0}^{N} ∣ Σ_{i} x_{i} = M} .

P (x ∣ Y) ≜ \frac{exp { - Σ _{k = 1}^{K} E ( x ∣ y _{k} )}}{Σ _{x} exp { - Σ _{k = 1}^{K} E ( x ∣ y _{k} )}},

P (x ∣ Y) ≜ \frac{exp { - Σ _{k = 1}^{K} E ( x ∣ y _{k} )}}{Σ _{x} exp { - Σ _{k = 1}^{K} E ( x ∣ y _{k} )}},

E (x ∣ y_{property}) ≜ - c_{property} lo g y_{property} (x),

E (x ∣ y_{property}) ≜ - c_{property} lo g y_{property} (x),

y_{sparse} (n) = x \sum P (x ∣ y_{sparse}) I (∥ x ∥_{0} = n),

y_{sparse} (n) = x \sum P (x ∣ y_{sparse}) I (∥ x ∥_{0} = n),

P (x) π (x, x^{'}) = P (x^{'}) π (x^{'}, x),

P (x) π (x, x^{'}) = P (x^{'}) π (x^{'}, x),

a (x, x^{'}) = min {1, \frac{P ( x ^{'} ) Q ( x ^{'} , x )}{P ( x ) Q ( x , x ^{'} )}} .

a (x, x^{'}) = min {1, \frac{P ( x ^{'} ) Q ( x ^{'} , x )}{P ( x ) Q ( x , x ^{'} )}} .

P (x ∣ y_{sparse}) = \frac{y _{sparse} ( n )}{Σ _{x^{'} \in X} I ( ∥ x ^{'} ∥ _{0} = n )},

P (x ∣ y_{sparse}) = \frac{y _{sparse} ( n )}{Σ _{x^{'} \in X} I ( ∥ x ^{'} ∥ _{0} = n )},

E (x ∣ y_{sparse})

E (x ∣ y_{sparse})

Σ_{x^{'} \in X} I (∥ x^{'} ∥_{0} = n) = (n N) (M - n M - 1),

Σ_{x^{'} \in X} I (∥ x^{'} ∥_{0} = n) = (n N) (M - n M - 1),

Q (x, x^{'}) = {\frac{1}{∣ X ∣} 0 if x^{'} \in X otherwise,

Q (x, x^{'}) = {\frac{1}{∣ X ∣} 0 if x^{'} \in X otherwise,

Q (x, x^{'}) = {α_{ij} (x) P (x_{i}^{'}, x_{j}^{'} ∣ x_{/ ij}) 0 if x_{/ ij} = x_{/ ij}^{'} otherwise,

Q (x, x^{'}) = {α_{ij} (x) P (x_{i}^{'}, x_{j}^{'} ∣ x_{/ ij}) 0 if x_{/ ij} = x_{/ ij}^{'} otherwise,

a (x, x^{'}) = min {1, \frac{α _{ij} ( x ^{'} )}{α _{ij} ( x )}} .

a (x, x^{'}) = min {1, \frac{α _{ij} ( x ^{'} )}{α _{ij} ( x )}} .

α_{ij} (x) = ⎩ ⎨ ⎧ \frac{2}{n ( N - 1 )} \frac{1}{n ( N - 1 )} 0 if x_{i} > 0, x_{j} > 0, i > j, if x_{i} > 0, x_{j} = 0 otherwise.

α_{ij} (x) = ⎩ ⎨ ⎧ \frac{2}{n ( N - 1 )} \frac{1}{n ( N - 1 )} 0 if x_{i} > 0, x_{j} > 0, i > j, if x_{i} > 0, x_{j} = 0 otherwise.

\frac{α _{ij} ( x ^{'} )}{α _{ij} ( x )} = ⎩ ⎨ ⎧ \frac{2}{1 + 1/ n}, \frac{1}{2 ( 1 - 1/ n )}, 1, if n^{'} = n + 1, if n^{'} = n - 1, if n^{'} = n .

\frac{α _{ij} ( x ^{'} )}{α _{ij} ( x )} = ⎩ ⎨ ⎧ \frac{2}{1 + 1/ n}, \frac{1}{2 ( 1 - 1/ n )}, 1, if n^{'} = n + 1, if n^{'} = n - 1, if n^{'} = n .

P (x_{i}^{'}, x_{j}^{'} ∣ x_{/ ij}, y_{sparse}) \propto \frac{y _{sparse} ( n ^{'} ) / ( n ^{'} N ) ( M - n ^{'} M - 1 )}{y _{sparse} ( n ) / ( n N ) ( M - n M - 1 )} .

P (x_{i}^{'}, x_{j}^{'} ∣ x_{/ ij}, y_{sparse}) \propto \frac{y _{sparse} ( n ^{'} ) / ( n ^{'} N ) ( M - n ^{'} M - 1 )}{y _{sparse} ( n ) / ( n N ) ( M - n M - 1 )} .

\frac{1/ ( n ^{'} N ) ( M - n ^{'} M - 1 )}{1/ ( n N ) ( M - n M - 1 )} = ⎩ ⎨ ⎧ \frac{n ( n + 1 )}{( N - n ) ( M - n )}, \frac{( N - n + 1 ) ( M - n + 1 )}{n ( n - 1 )}, 1, if n^{'} = n + 1, if n^{'} = n - 1, if n^{'} = n .

\frac{1/ ( n ^{'} N ) ( M - n ^{'} M - 1 )}{1/ ( n N ) ( M - n M - 1 )} = ⎩ ⎨ ⎧ \frac{n ( n + 1 )}{( N - n ) ( M - n )}, \frac{( N - n + 1 ) ( M - n + 1 )}{n ( n - 1 )}, 1, if n^{'} = n + 1, if n^{'} = n - 1, if n^{'} = n .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Bayesian Methods and Mixture Models · Statistical Methods and Inference

Full text

Sampler for Composition Ratio by Markov Chain Monte Carlo

Yachiko Obara

Tetsuro Morimura

Hiroki Yanagisawa IBM Research - Tokyo

[email protected], {tetsuro, yanagis}@jp.ibm.com

Abstract

Invention involves combination, or more precisely, ratios of composition. According to Thomas Edison, “Genius is one percent inspiration and 99 percent perspiration” is an example. In many situations, researchers and inventors already have a variety of data and manage to create something new by using it, but the key problem is how to select and combine knowledge. In this paper, we propose a new Markov chain Monte Carlo (MCMC) algorithm to generate composition ratios, nonnegative-integer-valued vectors with two properties: (i) the sum of the elements of each vector is constant, and (ii) only a small number of elements is nonzero. These constraints make it difficult for existing MCMC algorithms to sample composition ratios. The key points of our approach are (1) designing an appropriate target distribution by using a condition on the number of nonzero elements, and (2) changing values only between a certain pair of elements in each iteration. Through an experiment on creating a new cocktail, we show that the combination of the proposed method with supervised learning can solve a creative problem.

1 Introduction

The cocktail kir royal is a mixture of 90% champagne and 10% crème de cassis. Similarly, many elements of daily life can be regarded as mixtures, in which the amount of each component can be calculated as a ratio in relation to the whole: schedules, household expenses, investment portfolios, foods, drinks, cocktails, medicines, toiletries, cosmetics, fragrances, documents, if we do not consider the order. In this paper, a composition ratio means a set of such ratios in relation to the whole of a mixture’s components. We denote a composition ratio as a nonnegative vector $\boldsymbol{x}$ . The cocktail kir royal, for example, would be denoted as $\boldsymbol{x}=(90,10,0,\ldots,0)$ , where the first element of $\boldsymbol{x}$ is champagne, the second is crème de cassis, and the other elements represent other available ingredients. Composition ratios have two key characteristics. First, they are often sparse, containing a small number of components among a wide range of choices. Second, they have various desired properties. A fragrance, for example, is created by selecting dozens of ingredients from thousands of ingredients. The proportion of each selected ingredients is set under the condition that the total mass is $1000$ g, for example a fragrance composed of $700$ g of “ingredient A” and $300$ g of “ingredient B”. A fragrance can have desired properties related to aromatics (e.g., the type of smell), popularity (e.g., frequent patterns of ingredient combinations, or combinations that should be avoided), and appropriateness for certain use cases (e.g., combinations for perfumes, shampoos, or hand soaps). Perfumers who create new fragrances seek to develop various fragrances with desired properties. It is also possible that perfumers are willing to accept certain fragrances lacking some desired properties, because they can still draw inspiration from such fragrances. Thus, it is interesting to consider approaches for generating many fragrances in proportion to how well they satisfy desired conditions.

To solve such composition ratio problems, we propose a new Markov chain Monte Carlo (MCMC) algorithm and use MCMC samples themselves as solutions. MCMC is a strategy for generating samples $\boldsymbol{x}$ that follows samples drawn from a target distribution $P(\boldsymbol{x})$ . The strategy works by using a Markov chain to spend more time in a state $\boldsymbol{x}$ that has a higher probability $P(\boldsymbol{x})$ . MCMC techniques play fundamental roles in machine learning, physics, statistics, econometrics, and decision analysis Andrieu et al. (2003) and are often applied to solve integration problems in high-dimensional spaces for cases such as normalization, marginalization, and expectation. MCMC samples can also be used to obtain the maximum of an objective function, but this is inefficient as compared with other methods, including simulated annealing and gradient descent. Therefore, it is rare to use MCMC samples for the optimization problem, but we purposely use this approach because of the nature of our problem.

The general approach to generating composition ratios having several desired properties can be seen as an optimization problem. The Markowitz standard model is a well-known model for optimizing investment portfolios Markowitz (1952). Selecting sparse portfolios (i.e., portfolios with only a few active positions) is significant, because they allow accounting for transaction costs Brodie et al. (2009). There are many improved approaches based on the Markowitz standard model to handle more complex conditions and a wider range of choices, such as a neural-network-based model Fernández and Gómez (2007) and a particle swarm optimization model Cura (2009). Our proposed MCMC method does not overlap those methods but can be combined with them by appropriately designing the target distribution. In the same manner, we can combine our method with generative models, such as latent Dirichlet allocation (LDA) for handling documents as composition ratios Blei et al. (2003).

In our research, we focus on two aspects of the problem: creativity and resource allocation. For example, fragrance development emphasizes creativity, while investment portfolio selection emphasizes resource allocation. In this paper we address the former creative problem. Through an experiment on creating a new cocktail, we show that our method can solve such creative problems through combination with supervised learning.

The contributions of this paper are the following:

•

We propose an efficient MCMC algorithm to generate composition ratios with a small number of components chosen from a wide range of choices.

•

We report empirical evidence that the combination of our method with supervised learning can solve a creative problem.

2 Problem Formulation

In this section we formulate our problem. First, we define the composition ratio. In this paper, we consider discretized composition ratios. Specifically, a composition ratio is a nonnegative integer vector $\boldsymbol{x}$ having $N$ dimensions and the property that the sum of the elements of $\boldsymbol{x}$ is equal to an integer $M$ . Thus, the set of composition ratios is

[TABLE]

A vector $\boldsymbol{x}\in\mathcal{X}$ can be modeled as having $M$ balls in $N$ bins. The number of bins containing at least one ball is denoted as $n$ . In the case of a cocktail, for example, $\boldsymbol{x}$ denotes the composition ratio of the cocktail, $N$ denotes the number of all available ingredients, $x_{i}$ denotes the amount of the $i$ -th ingredient, and $n$ denotes the number of ingredients used in the cocktail. When $M$ equals 100, each $x_{i}$ can be regarded as the percentage of the $i$ -th ingredient in relation to the whole.

We consider the problem of generating $T$ random samples { $\boldsymbol{x}^{(1)},$ $\boldsymbol{x}^{(2)},\ldots,$ $\boldsymbol{x}^{(t)},\ldots,$ $\boldsymbol{x}^{(T)}\}$ from a given probability distribution $P(\boldsymbol{x}|\mathcal{Y})$ , where $\boldsymbol{x}^{(t)}\in\mathcal{X}$ and $\mathcal{Y}\triangleq\{y_{1},y_{2},...,y_{K}\}$ is a set of conditions, which is given by user according to the application. We define

[TABLE]

where $\mathop{E}$ is a scalar-valued function called the energy function. Equation (2) is a common way to define a valid probability (because the value is always positive and the sum of all elements is one) LeCun et al. (2006). We can define $P(\boldsymbol{x}|\mathcal{Y})$ indirectly by defining the energy function $E(\boldsymbol{x}|y_{k})$ . We also write the probability distribution $P(\boldsymbol{x}|y_{k})$ $\propto$ $\exp\{-\mathop{E}(\boldsymbol{x}|y_{k})\}$ . Note that $P(\boldsymbol{x}|\mathcal{Y})\propto\prod_{k=1}^{K}P(\boldsymbol{x}|y_{k})$ .

In this paper, we handle two types of conditions $y_{1}$ and $y_{k}(k>1)$ . The condition $y_{1}$ describes the sparsity condition. The other $y_{k}(k>1)$ describe conditions of the other desired properties. For readability, we denote $y_{1}$ as $y_{\text{sparse}}$ , and $y_{k}(k>1)$ as $y_{\text{property}}$ from here. First, we define $y_{\text{property}}$ and its energy function. Let $y_{\text{property}}$ be a nonnegative-valued function of $\boldsymbol{x}$ , which outputs the goodness of fit of $\boldsymbol{x}$ with respect to the targeted property. The corresponding energy function is naturally defined as

[TABLE]

where $c_{\text{property}}\in\mathbb{R}_{\geq 0}$ is a hyper-parameter controlling the priority of the condition $y_{\text{property}}$ . In the case of a cocktail, for example, a condition $y_{\text{property}}$ might demand that the taste of $\boldsymbol{x}$ should be “Fresh”. As the taste of $\boldsymbol{x}$ is closer to “Fresh”, $y_{\text{property}}(\boldsymbol{x})$ outputs a higher value.

As a second type of condition, we consider the conditions on the sparsity of the desired samples. We define the sparsity condition $y_{\text{sparse}}$ as the parameter of the categorical distribution, $y_{\text{sparse}}\in\mathbb{R}^{N}_{\geq 0}$ . The sparsity condition $y_{\text{sparse}}$ requires that $P(x|y_{\text{sparse}})$ satisfy

[TABLE]

where $\mathbb{I}$ is an indicator function, and $\|\boldsymbol{x}\|_{0}$ denotes $\ell_{0}$ -norm of $\boldsymbol{x}$ (i.e., the number of nonzero elements of $\boldsymbol{x}$ ). We derive the energy function $\mathop{E}(\boldsymbol{x}|y_{\text{sparse}})$ by solving equation (4) in Section 4.1.

Before proceeding, we show examples of a sparsity condition $y_{\text{sparse}}$ . In the case of creating cocktails, for example, $y_{\text{sparse}}$ indicates how many ingredients are likely to be used in a cocktail. There are several possible candidates for $y_{\text{sparse}}$ . It can be a unimodal distribution if there is a rough desirable number of ingredients used in a cocktail. In another case, it can be a bimodal distribution when we aim to simultaneously generate simple cocktails (smaller number of ingredients) and complex cocktails (higher number of ingredients). Also, when a smaller number of ingredients is acceptable, we can use a distribution in which the probability exponentially decays as the number of ingredients increases.

3 MCMC Algorithms

An MCMC sampler is a standard algorithm that uses a Markov chain to generate samples from a target probability distribution $P(\boldsymbol{x})$ . In Section 3.1 we start by reviewing Markov chains and MCMC methods. Section 3.2 explains the Metropolis-Hastings algorithm Hastings (1970) Metropolis et al. (1953), a notable MCMC algorithm.

3.1 Markov Chain and MCMC

We consider a Markov chain, in which the future state $\boldsymbol{x}^{\prime}$ depends only on the current state $\boldsymbol{x}$ , which is a real-valued vector having $N$ dimensions. Let $\pi(\boldsymbol{x},\boldsymbol{x}^{\prime})$ be the transition probability of the Markov chain from $\boldsymbol{x}$ to $\boldsymbol{x}^{\prime}$ . It is known that, for any initial state, a Markov chain converges to an invariant distribution as long as the transition probability $\pi(\boldsymbol{x},\boldsymbol{x}^{\prime})$ satisifes the detailed balance condition, as follows.

Definition.

The detailed balance condition is a sufficient condition to ensure that a distribution $P(\boldsymbol{x})$ is the invariant distribution to which a Markov chain converges, when the chain is ergodic. The condition is met when any $\boldsymbol{x}$ and $\boldsymbol{x}^{\prime}$ satisfy the following condition:

[TABLE]

Let $P(\boldsymbol{x})$ be a target distribution. Then, an MCMC sampler is a Markov chain that has an invariant distribution equal to the target distribution $P(\boldsymbol{x})$ . It is common in MCMC algorithms to design the transition probability $\pi(\boldsymbol{x},\boldsymbol{x}^{\prime})$ to ensure that the detailed balance condition is satisfied. Note that each sample generated from an MCMC sampler can be seen as a visited state at each time step of the Markov chain, so we use the words “sample” and “state” interchangeably throughout this paper. To solve the problem formulated in the previous section: generating $T$ random samples { $\boldsymbol{x}^{(1)},$ $\boldsymbol{x}^{(2)},\ldots,$ $\boldsymbol{x}^{(t)},\ldots,$ $\boldsymbol{x}^{(T)}\}$ from a given probability distribution $P(\boldsymbol{x}|\mathcal{Y})$ , we use an MCMC algorithm. Instead of sampling each $\boldsymbol{x}$ from scratch, we sample $\boldsymbol{x}^{\prime}$ according to the previous sample $\boldsymbol{x}$ . Note that an MCMC algorithm assumes that the initial sample $\boldsymbol{x}^{(0)}$ is given. We sample $\boldsymbol{x}^{(1)}$ from $\boldsymbol{x}^{(0)}$ based on the transition probability $\pi(\boldsymbol{x}^{(0)},\boldsymbol{x}^{(1)})$ and we repeat this process until we get $T$ random samples.

3.2 Metropolis-Hastings Algorithm

The Metropolis-Hastings algorithm is one of the most popular MCMC algorithms, and Algorithm 1 lists its pseudocode. We denote the continuous uniform distribution of which minimum value is zero and maximum value is one as $\mathcal{U}(0,1)$ . The Metropolis-Hastings algorithm requires the proposal distribution $Q(\boldsymbol{x},\boldsymbol{x}^{\prime})$ that the user should design in advance. In this algorithm a candidate $\boldsymbol{x}^{\prime}$ for the next sample is drawn from a proposal distribution $Q(\boldsymbol{x},\boldsymbol{x}^{\prime})$ , which is a probability distribution given the current sample $\boldsymbol{x}$ . Depending on the acceptance rate $a(\boldsymbol{x},\boldsymbol{x}^{\prime})$ below, the candidate is accepted and used as the next sample.

[TABLE]

When the candidate is rejected, the current sample is used as the next sample. In other words, the Markov chain moves from the current state $\boldsymbol{x}$ to the next state $\boldsymbol{x}^{\prime}$ with acceptance rate $a(\boldsymbol{x},\boldsymbol{x}^{\prime})$ , and otherwise, it remains at $\boldsymbol{x}$ . The transition probability is calculated as $\pi(\boldsymbol{x},\boldsymbol{x}^{\prime})$ $=Q(\boldsymbol{x},\boldsymbol{x}^{\prime})a(\boldsymbol{x},\boldsymbol{x}^{\prime})$ . The detailed balance condition is known to be satisfied when the acceptance rate is used Chib and Greenberg (1995). One drawback of the Metropolis-Hastings algorithm is that, if the acceptance rate $a(\boldsymbol{x},\boldsymbol{x}^{\prime})$ is much smaller than one, it rejects many samples, the mixing time to invariant distribution is very high, and thus the computation time will be unacceptable. It is important to design an appropriate proposal distribution so that the acceptance rate is close to one.

4 MCMC Algorithm for Composition Ratios

In this section we propose our MCMC algorithm for composition ratios. We start in Section 4.1 by describing how to handle the sparsity condition $y_{\text{sparse}}$ . Section 4.2 shows that a naive MCMC algorithm based on the Metropolis-Hastings algorithm has a small acceptance rate, and we thus propose a new algorithm to improve the acceptance rate in Section 4.3. In this section, for simplicity, we sometimes omit $\mathcal{Y}$ from the target distribution $P(\boldsymbol{x}|\mathcal{Y})$ . Specifically, $P(\boldsymbol{x}|\mathcal{Y})$ is sometimes denoted as $P(\boldsymbol{x})$ .

4.1 Energy Function of Sparsity Condition

To handle the sparsity condition $y_{\text{sparse}}$ , we derive the energy function $E(\boldsymbol{x}|y_{\text{sparse}})$ by solving equation (4). As the sparsity condition $y_{\text{sparse}}$ relates only to the number of nonzero elements $n$ , an $\boldsymbol{x}$ having the same number of nonzero elements $n$ has the same probability $P(\boldsymbol{x}|y_{\text{sparse}})$ , so equation (4) can be rewritten as the following:

[TABLE]

where the denominator is the number of $\boldsymbol{x}\in\mathcal{X}$ that satisfy $\|\boldsymbol{x}\|_{0}=n$ . From equation (6) and the definition of $P(\boldsymbol{x}|y_{\text{sparse}})$ $\propto$ $\exp\{-\mathop{c}_{\text{sparse}}\mathop{E}(\boldsymbol{x}|y_{\text{sparse}})\}$ (cf. equation (2)), the energy function $E(\boldsymbol{x}|y_{\text{sparse}})$ can be derived as

[TABLE]

where $c_{\text{sparse}}\in\mathbb{R}_{\geq 0}$ is the hyperparameter controlling the priority of the condition $y_{\text{sparse}}$ . In this paper we set $c_{\text{sparse}}=1$ . The denominator of equation (7) can be computed as

[TABLE]

The first term on the right-hand side of equation (8) corresponds to the number of combinations for choosing $n$ bins from $N$ bins. The second term corresponds to the number of combinations that put a ball in each of the chosen $n$ bins and choose bins for allocating the remaining $M-n$ balls from $n$ bins. There are more combinations in the case in which the balls are distributed in many bins, so with the balls allocated randomly, $n$ tends to become larger, as listed in Table 1. As described in the next section, our algorithm avoids directly calculating the number of combinations in equation (8).

4.2 Naive MCMC Algorithm

As a naive MCMC algorithm, we first describe the case of using the Metropolis-Hastings algorithm to choose a candidate for the next sample $\boldsymbol{x}^{\prime}$ from a set of composition ratios $\mathcal{X}$ uniformly at random. This is achieved by setting the proposal distribution to the following:

[TABLE]

where $|\mathcal{X}|$ is the number of elements of $\mathcal{X}$ . By equation (5), the acceptance rate is set to $a(\boldsymbol{x},\boldsymbol{x}^{\prime})=\min\{1,P(\boldsymbol{x}^{\prime})/P(\boldsymbol{x})\}$ . For simplicity, we assume that $P(\boldsymbol{x}|\mathcal{Y})=P(\boldsymbol{x}|y_{\text{sparse}})$ . Here, the acceptance rate tends to be small, because non-sparse samples having low probability $P(\boldsymbol{x})$ are drawn more often than sparse samples having high probability $P(\boldsymbol{x})$ (see Table 1). Designing the proposal distribution to achieve a high acceptance rate under the sparsity condition is a nontrivial problem. Hence, the next subsection describes how to improve the acceptance rate.

4.3 Accelerated MCMC Algorithm for Composition Ratios

To increase acceptance rate, we propose a new MCMC algorithm. Algorithm 2 lists the pseudocode for this algorithm. We denote the discrete uniform distribution on integers of which minimum value is one and maximum value is $N$ as $\mathcal{U}\{1,N\}$ . Instead of sampling each $\boldsymbol{x}^{\prime}$ from the proposal distribution $Q(\boldsymbol{x},\boldsymbol{x}^{\prime})$ given in equation (9), we sample $\boldsymbol{x}^{\prime}$ according to the previous sample $\boldsymbol{x}$ . In each iteration we change values of only two elements, $x_{i}$ and $x_{j}$ , of the previous sample $\boldsymbol{x}$ , meeting the condition of equation (1). We introduce a constraint on the probability for choosing the pair of elements to be changed in each iteration, namely, that at least one of the elements is nonzero. This constraint avoids the case in which both $x_{i}$ and $x_{j}$ are zeros, causing $\boldsymbol{x}^{\prime}$ to remain $\boldsymbol{x}$ . Such cases often occur because $\boldsymbol{x}$ is sparse, and this is why we introduce the above constraint. We denote the probability for choosing the pair of elements to be changed as $\alpha_{ij}(\boldsymbol{x})$ . The constraint is described that we set $\alpha_{ij}(\boldsymbol{x})=0$ for most pairs of $i$ and $j$ . Furthermore, instead of changing the values of $x_{i}$ and $x_{j}$ uniformly at random, we change them based on the probability which is proportional to conditions of the target distribution. Hence, we define the proposal distribution as

[TABLE]

where $\boldsymbol{x}_{/ij}$ denotes $x_{1},\ldots,x_{N}$ but with $x_{i}$ and $x_{j}$ omitted. The idea of using conditions of the target distribution as the proposal distribution is known as Gibbs sampling Geman and Geman (1984). In Gibbs sampling $\alpha_{ij}(\boldsymbol{x})$ is a constant (not depends on $\boldsymbol{x}$ ), and so the acceptance rate is one. We introduce the constraint that $\alpha_{ij}(\boldsymbol{x})$ depends on $\boldsymbol{x}$ , and we can offset the effect of the constraint by adjusting the MCMC acceptance rate $a(\boldsymbol{x},\boldsymbol{x}^{\prime})$ in equation (5). Thus, we use the following adjusted acceptance rate:

[TABLE]

This is derived by using $P(x_{i},x_{j}|\boldsymbol{x}_{/ij})P(\boldsymbol{x}_{/ij})=P(\boldsymbol{x})$ , where $P(\boldsymbol{x}_{/ij})$ is the marginal distribution of $P(\boldsymbol{x})$ marginalized with respect to $x_{i}$ and $x_{j}$ . The proposed algorithm satisfies the detailed balance condition, which can be shown in the same manner as for the Metropolis-Hastings algorithm.

Here, we consider equation (11) in detail to show that the acceptance rate can be small. In Algorithm 2 we set $\alpha_{ij}(\boldsymbol{x})$ as the following:

[TABLE]

In each iteration, $x_{i}+x_{j}$ balls are reallocated into $x_{i}^{\prime}$ and $x_{j}^{\prime}$ . Therefore, the number of nonzero elements, $n^{\prime}$ , of sample $\boldsymbol{x}^{\prime}$ satisfies $n-1\leq n^{\prime}\leq n+1$ , where $n$ is the number of nonzero elements of the previous sample $\boldsymbol{x}$ :

[TABLE]

Therefore, when the number of nonzero elements decreases, the acceptance rate can be small. We also confirm that the acceptance rate is always greater than 1/2 from equation (12).

Next, we explain how the algorithm avoids calculating the number of combinations in equation (8). Instead of calculating $P(x^{\prime}_{i},x^{\prime}_{j}|\boldsymbol{x}_{/ij},y_{\text{sparse}})\propto P(x^{\prime}_{i},x^{\prime}_{j}|\boldsymbol{x}_{/ij},\mathcal{Y})$ directly, we calculate the ratio of $P(x^{\prime}_{i},x^{\prime}_{j}|\boldsymbol{x}_{/ij},y_{\text{sparse}})$ and $P(x_{i},x_{j}|\boldsymbol{x}_{/ij},y_{\text{sparse}})$ , i.e., the right-hand side of the following equation:

[TABLE]

Notice that we have

[TABLE]

Thus, we can avoid calculating the large number of combinations.

5 Experimental Results

This section describes the results of two experiments. In the first experiment (Section 5.1), we verified that samples drawn by the proposed MCMC algorithm satisfied a sparsity condition and converged to a target distribution. In the second experiment (Section 5.2), we attempted to use the algorithm in a creative work by creating a new cocktail. We discarded $10\,000$ iterations at the beginning of each MCMC run, which is a common practice to find a good starting point for MCMC methods.

5.1 Control of Sparsity

We evaluated the proposed algorithm on the task of satisfying a sparse condition $y_{\text{sparse}}$ . We conducted two experiments: one used small $N$ and small $M$ (a small ingredients set and a rough division), and another used large $N$ and large $M$ (a large ingredients set and a fine division). We also evaluated the naive MCMC algorithm and Gibbs sampling. Here Gibbs sampling is the same as Algorithm 2 except $\alpha_{ij}(\boldsymbol{x})$ is constant (not depends on $\boldsymbol{x}$ ), so it satisfies the condition that the sum of the elements of each sample is constant.

5.1.1 Small Ingredients Set and Rough Division

We set $P(\boldsymbol{x}|\mathcal{Y})=P(\boldsymbol{x}|y_{\text{sparse}})$ , $N=10$ , and $M=5$ . We conducted experiments on two target distributions: (a) a uniform distribution $y_{\text{sparse}}(n)=\mathcal{U}\{2,5\}$ and (b) a unimodal distribution $y_{\text{sparse}}(n)\propto\exp(-0.25(n-3)^{2})$ . We ran $T=50\,000$ iterations. For each target distribution, we sampled 10 times while changing the initial sample, which was generated randomly from $P(\boldsymbol{x}|\mathcal{Y})$ .

The both MCMC samples of the naive MCMC algorithm and Gibbs sampling did not match the target distributions while Algorithm 2 converged. Table 2 lists the accepted rates of proposed MCMC samples with the naive MCMC algorithm, Gibbs sampling, and Algorithm 2. We calculated how often the new sample $\boldsymbol{x}^{(t)}$ was different from the current sample $\boldsymbol{x}^{(t-1)}$ and listed them as updated rates in Table 3. Although the accepted rate of Gibbs sampling is one, it is often the case in which both $x_{i}$ and $x_{j}$ are zero, causing a new sample $\boldsymbol{x}^{(t)}$ to remain the previous sample $\boldsymbol{x}^{(t-1)}$ . Such cases often occur because $\boldsymbol{x}$ is sparse. Regarding the naive MCMC algorithm, it worked well on sampling various samples because both of the accepted rate and the updated rate are more than 70 $\%$ . However those samples did not match the target distributions.

5.1.2 Large Ingredients Set and Fine Division

We set $P(\boldsymbol{x}|\mathcal{Y})=P(\boldsymbol{x}|y_{\text{sparse}})$ , $N=2000$ , and $M=100$ . We conducted experiments on four target distributions: (a) a uniform distribution $y_{\text{sparse}}(n)=\mathcal{U}\{17,24\}$ , (b) a unimodal distribution $y_{\text{sparse}}(n)\propto\exp(-0.25(n-20)^{2})$ , (c) a bimodal distribution $y_{\text{sparse}}(n)\propto\exp(-0.5(n-15)^{2})+2\exp(-0.5(n-20)^{2})$ and (d) an exponential distribution $y_{\text{sparse}}(n)\propto 0.5\exp(-0.5n)$ . We ran $T=500\,000$ iterations. For each target distribution, we sampled 10 times while changing the initial sample, which was generated randomly from $P(\boldsymbol{x}|\mathcal{Y})$ .

Figure 1 shows histograms of the mean number of nonnegative elements $n$ from MCMC samples of 10 sequences. The histograms approximate the target distributions well. Table 4 and Table 5 list the accepted rates and the updated rates of proposed MCMC samples with the naive MCMC algorithm and Algorithm 2 respectively. Because $N$ was much larger than the expected $n$ , the naive MCMC algorithm could not propose appropriate candidates, and so the acceptance rates were zero. We thus confirmed that the accepted rates with Algorithm 2 were much larger than those with the naive MCMC algorithm. In the same manner as Gibbs sampling in the previous experiments, Gibbs sampling had small updated rates and did not match the target distributions.

5.2 Creation of New Cocktail

We also evaluated the proposed algorithm on the task of creating new cocktails. We used a cocktail dataset Andjelkovic (2018) containing 69 cocktails, with seven taste labels $\{$ “Bittersweet,” “Boozy,” “Fresh,” “Salty,” “Sour,” “Sweet,” “Unknown” $\}$ , and four timing labels $\{$ “After dinner,” “All day,” “Long drink,” “Pre-dinner” $\}$ . We aimed to generate $100\,000$ cocktails (samples) labeled as “All day” and “Fresh.” The total number of ingredients, $N$ , was $65$ . We converted all ingredient unit amounts (e.g., centiliter, bar spoon, dash, splash) into permille values, giving $M=1000$ . In the dataset, the number of ingredients used in one cocktail ranged from two to five, and we set $y_{\text{sparse}}$ accordingly: $y_{\text{sparse}}(2)=13/69,y_{\text{sparse}}(3)=29/69,y_{\text{sparse}}(4)=24/69,$ and $y_{\text{sparse}}(5)=3/69$ . We also add the conditions $y_{\text{taste}}$ and $y_{\text{timing}}$ demanding that the taste be “Fresh” and timing be “All day”, respectively. We trained two multi-class classification models using a random forest: one for the taste labels, and the other for the timing labels. We then used the corresponding outputs as $y_{\text{taste}}(\boldsymbol{x})$ and $y_{\text{timing}}(\boldsymbol{x})$ . Note that the output of a random forest can be regarded as a probability Bostrom (2007). We set the hyperparameter $c$ to one and ran the proposed MCMC algorithm for $T=110\,000$ iterations. The initial sample was generated randomly under the condition of $y_{\text{sparse}}$ .

Figure 2 shows recipes (i.e., composition ratios) for a new cocktail and the most similar cocktails in the dataset. To calculate similarities, we focused on combinations of ingredients and used the Szymkiewicz-Simpson coefficient Simpson (1960). All the most similar cocktails were labeled with “All day” and “Fresh,” but they had only one ingredient in common with the new cocktail. This result suggests that we should have the classification models focus on co-occurrence to get improved outcomes. The combination of champagne and orange juice used in the new cocktail did not appear in the dataset but is actually a popular cocktail called a mimosa. Our algorithm thus reinvented the combination of ingredients in a popular cocktail. Figure 3 shows the composition ratios of the new cocktail and the similar cocktails in the dataset. The composition ratio of champagne is smaller in the new cocktail than in the others. We attributed this to the constraint imposed by the composition ratios with $M=1000$ , but this problem requires more investigation.

Figure 4 shows the frequencies of $y_{\text{taste}}(\boldsymbol{x})y_{\text{timing}}(\boldsymbol{x})$ occurring for randomly generated samples under the condition of $y_{\text{sparse}}$ and samples generated by the proposed MCMC algorithm. Our MCMC algorithm more efficiently sampled cocktails having large $y_{\text{taste}}(\boldsymbol{x})y_{\text{timing}}(\boldsymbol{x})$ than the random method did. In other words, the MCMC samples were more likely to have the “Fresh” and “All day” labels. Figure 4 also shows greater variety for the MCMC samples in terms of the measure of the target distribution.

6 Conclusion

We have proposed an MCMC algorithm that can generate composition ratio samples satisfying the sparsity and conditions of composition ratios. Our empirical results show that the proposed method converges to the target distribution with sparsity conditions, and the combination with supervised learning can solve a creative problem.

Acknowledgments

This research was supported by CREST, JST Grant Number JPMJCR1304, Japan.

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Andjelkovic [2018] Stevan Andjelkovic. Cocktails dataset. https://github.com/stevana/cocktails , 2018.
2Andrieu et al. [2003] Christophe Andrieu, Nando D. Freitas, Arnaud Doucet, and Michael I. Jordan. An introduction to MCMC for machine learning. Machine learning , 50(1-2):5–43, 2003.
3Blei et al. [2003] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. Journal of machine Learning research , 3(Jan):993–1022, 2003.
4Bostrom [2007] Henrik Bostrom. Estimating class probabilities in random forests. In Sixth International Conference on Machine Learning and Applications (ICMLA 2007) , pages 211–216. IEEE, 2007.
5Brodie et al. [2009] Joshua Brodie, Ingrid Daubechies, Christine D. Mol, Domenico Giannone, and Ignace Loris. Sparse and stable Markowitz portfolios. Proceedings of the National Academy of Sciences , 106(30):12267–12272, 2009.
6Chib and Greenberg [1995] Siddhartha Chib and Edward Greenberg. Understanding the metropolis-hastings algorithm. The american statistician , 49(4):327–335, 1995.
7Cura [2009] Tunchan Cura. Particle swarm optimization approach to portfolio optimization. Nonlinear analysis: Real world applications , 10(4):2396–2406, 2009.
8Fernández and Gómez [2007] Alberto Fernández and Sergio Gómez. Portfolio selection using neural networks. Computers & Operations Research , 34(4):1177–1191, 2007.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Sampler for Composition Ratio by Markov Chain Monte Carlo

Abstract

1 Introduction

2 Problem Formulation

3 MCMC Algorithms

3.1 Markov Chain and MCMC

Definition**.**

3.2 Metropolis-Hastings Algorithm

4 MCMC Algorithm for Composition Ratios

4.1 Energy Function of Sparsity Condition

4.2 Naive MCMC Algorithm

4.3 Accelerated MCMC Algorithm for Composition Ratios

5 Experimental Results

5.1 Control of Sparsity

5.1.1 Small Ingredients Set and Rough Division

5.1.2 Large Ingredients Set and Fine Division

5.2 Creation of New Cocktail

6 Conclusion

Acknowledgments

Definition.