Context Mixing via Ground State Search

Kentaro Imafuku

arXiv:1903.04160·quant-ph·April 4, 2019

Context Mixing via Ground State Search

Kentaro Imafuku

PDF

Open Access

TL;DR

This paper introduces a novel approach to the context mixing problem by formulating it as a ground state search using an effective Hamiltonian, aiming to optimize the combination of prior distributions for better modeling.

Contribution

It proposes a new Hamiltonian-based method for context mixing that approximates the target distribution by finding its ground state.

Findings

01

Effective Hamiltonian successfully models context mixing.

02

Ground state search improves distribution approximation.

03

Method outperforms traditional mixing techniques.

Abstract

To address context mixing problem via ground state search, we introduce an effective Hamiltonian whose ground state presents the best mixing of a prior given probability distributions to approximately describe unknown target probability distribution.

Equations61

{p_{i} (x)}_{i \in C}

{p_{i} (x)}_{i \in C}

P_{ω}^{(L)} (x) := i \in C \sum ω_{i} p_{i} (x)

P_{ω}^{(L)} (x) := i \in C \sum ω_{i} p_{i} (x)

P_{ω}^{(M E)} (x) = \frac{1}{Z ( ω )} exp (i \in C \sum ω_{i} η_{i} (x))

P_{ω}^{(M E)} (x) = \frac{1}{Z ( ω )} exp (i \in C \sum ω_{i} η_{i} (x))

η_{i} (x) := - ln p_{i} (x), \mbox an d Z (ω) = x \in X \sum exp (i \in C \sum ω_{i} η_{i} (x))

η_{i} (x) := - ln p_{i} (x), \mbox an d Z (ω) = x \in X \sum exp (i \in C \sum ω_{i} η_{i} (x))

τ_{i} := x \in X \sum η_{i} (x) P_{ω}^{(M E)} (x) = - x \in X \sum P_{ω}^{(M E)} (x) ln p_{i} (x) .

τ_{i} := x \in X \sum η_{i} (x) P_{ω}^{(M E)} (x) = - x \in X \sum P_{ω}^{(M E)} (x) ln p_{i} (x) .

Δ τ_{i} :

Δ τ_{i} :

=

D (ω) := x \in X \sum μ (x) ln \frac{μ ( x )}{P _{ω}^{(M E)} ( x )}

D (ω) := x \in X \sum μ (x) ln \frac{μ ( x )}{P _{ω}^{(M E)} ( x )}

E (ω) := - x \in X \sum μ (x) ln P_{ω}^{(M E)} (x)

E (ω) := - x \in X \sum μ (x) ln P_{ω}^{(M E)} (x)

\overset{μ}{^} := x \in X \sum μ (x) ∣ x ⟩ ⟨ x ∣

\overset{μ}{^} := x \in X \sum μ (x) ∣ x ⟩ ⟨ x ∣

\hat{H} ∣ ω ⟩ = E (ω) ∣ ω ⟩

\hat{H} ∣ ω ⟩ = E (ω) ∣ ω ⟩

ω_{i} \in Ω_{i}

ω_{i} \in Ω_{i}

P_{ω}^{(M E)} (x) = exp (λ (ω) + i \in C \sum ω_{i} η_{i} (x))

P_{ω}^{(M E)} (x) = exp (λ (ω) + i \in C \sum ω_{i} η_{i} (x))

λ (ω) = - ln Z (ω),

λ (ω) = - ln Z (ω),

E (ω) = - x \in X \sum μ (x) (λ (ω) + i \in C \sum ω_{i} η_{i} (x))

E (ω) = - x \in X \sum μ (x) (λ (ω) + i \in C \sum ω_{i} η_{i} (x))

E^{'} (λ, ω) := - x \in X \sum μ (x) (λ + i \in C \sum ω_{i} η_{i} (x)) .

E^{'} (λ, ω) := - x \in X \sum μ (x) (λ + i \in C \sum ω_{i} η_{i} (x)) .

\hat{H}^{'} ∣ λ, ω ⟩ = E^{'} (λ, ω) ∣ λ, ω ⟩ .

\hat{H}^{'} ∣ λ, ω ⟩ = E^{'} (λ, ω) ∣ λ, ω ⟩ .

E (λ, ω) = E^{'} (λ, ω) + α (x \in X \sum exp (λ + i \in C \sum ω_{i} η_{i} (x)) - 1)^{2}

E (λ, ω) = E^{'} (λ, ω) + α (x \in X \sum exp (λ + i \in C \sum ω_{i} η_{i} (x)) - 1)^{2}

\hat{H} ∣ λ, ω ⟩ = E (λ, ω) ∣ λ, ω ⟩,

\hat{H} ∣ λ, ω ⟩ = E (λ, ω) ∣ λ, ω ⟩,

\hat{G} :

\hat{G} :

\hat{λ} := λ \in Λ \sum λ ∣ λ ⟩ ⟨ λ ∣, \overset{ω}{^}_{i} := ω_{i} \in Ω_{i} \sum ω_{i} ∣ ω_{i} ⟩ ⟨ ω_{i} ∣

\hat{λ} := λ \in Λ \sum λ ∣ λ ⟩ ⟨ λ ∣, \overset{ω}{^}_{i} := ω_{i} \in Ω_{i} \sum ω_{i} ∣ ω_{i} ⟩ ⟨ ω_{i} ∣

∣ λ ⟩ \in H_{Λ}, ∣ ω_{i} ⟩ \in H_{Ω_{i}}, ∣ x ⟩ \in H_{X_{0}}, ∣ x^{'} ⟩ \in H_{X_{1}}, \mbox an d, ∣ x^{''} ⟩ \in H_{X_{2}} .

∣ λ ⟩ \in H_{Λ}, ∣ ω_{i} ⟩ \in H_{Ω_{i}}, ∣ x ⟩ \in H_{X_{0}}, ∣ x^{'} ⟩ \in H_{X_{1}}, \mbox an d, ∣ x^{''} ⟩ \in H_{X_{2}} .

H := H_{Λ} \otimes H_{Ω} \otimes H_{X_{0}} \otimes H_{X_{1}} \otimes H_{X_{2}}

H := H_{Λ} \otimes H_{Ω} \otimes H_{X_{0}} \otimes H_{X_{1}} \otimes H_{X_{2}}

H_{Ω} := i \in C ⨂ H_{Ω_{i}} .

H_{Ω} := i \in C ⨂ H_{Ω_{i}} .

\overset{ν}{^} := \overset{μ}{^} \otimes \frac{I ^}{∣ X ∣} \otimes \frac{I ^}{∣ X ∣}

\overset{ν}{^} := \overset{μ}{^} \otimes \frac{I ^}{∣ X ∣} \otimes \frac{I ^}{∣ X ∣}

\hat{H}_{e f f} :

\hat{H}_{e f f} :

\hat{H}_{e f f} (t) = \frac{t}{T} \hat{H}_{e f f} + (1 - \frac{t}{T}) \hat{V}

\hat{H}_{e f f} (t) = \frac{t}{T} \hat{H}_{e f f} + (1 - \frac{t}{T}) \hat{V}

∣ φ_{0} ⟩ \in H_{Λ} \otimes H_{Ω}

∣ φ_{0} ⟩ \in H_{Λ} \otimes H_{Ω}

\hat{G} (t) = \frac{t}{T} \hat{G} + (1 - \frac{t}{T}) \hat{V} \otimes \hat{I}

\hat{G} (t) = \frac{t}{T} \hat{G} + (1 - \frac{t}{T}) \hat{V} \otimes \hat{I}

\overset{ρ}{^}_{0} := ∣ φ_{0} ⟩ ⟨ φ_{0} ∣ \otimes \overset{ν}{^} .

\overset{ρ}{^}_{0} := ∣ φ_{0} ⟩ ⟨ φ_{0} ∣ \otimes \overset{ν}{^} .

\frac{d}{d t} ∣ φ_{t} ⟩ = - i \hat{H}_{e f f} (t) ∣ φ_{t} ⟩

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Bayesian Modeling and Causal Inference · DNA and Biological Computing

Full text

Context Mixing via Ground State Search111This work is based on results from a project commissioned by the new Energy and Industrial Technology Development Organization (NEDO), Japan

Kentaro Imafuku

National Institute of Advanced Industrial Science and Technology (AIST)

Aomi 2-3-26, Koto-ku, Tokyo 1350064, Japan

Abstract

To address context mixing problem via ground state search, we introduce an effective Hamiltonian whose ground state presents the best mixing of a prior given probability distributions to approximately describe unknown target probability distribution.

1 Context Mixing

Context mixing [1, 2, 3] is a way to estimate an unknown probability distribution by appropriately mixing a prior given probability distributions. Let

[TABLE]

be the given probability distributions where $i\in{\mathcal{C}}$ indicates a condition that gives context defining $p_{i}(x)$ . For simplicity, we restrict ourselves to the case where $x\in{\mathcal{X}}$ is discrete. Most intuitive way would be introducing a linear combination as

[TABLE]

with $\omega_{i}>0$ , $\sum_{i\in{\mathcal{C}}}\omega_{i}=1$ and $\bm{\omega}:=\{\omega_{i}\}_{i\in{\mathcal{C}}}$ . Under the parameterization, context mixing takes form of finding parameter $\bm{\omega}^{*}:=\{\omega_{i}^{*}\}_{i\in{\mathcal{C}}}$ such that $P^{(L)}_{\bm{\omega}\rightarrow\bm{\omega}^{*}}(x)$ becomes a “good” approximation of a target probability distribution $\mu(x)$ . Besides the parameterization in eq.(2), another parameterization as

[TABLE]

with

[TABLE]

is also often considered[2]. The form of parameterization in eq.(3) which is derived based on the maximum entropy principle[4, 5] guarantees that $P^{(ME)}_{\bm{\omega}}(x)$ is the probability distribution with the maximum entropy among others which have the same expectation values with

[TABLE]

Notice that $\tau_{i}$ corresponds to the mean code length in the case where $x$ following $P^{(ME)}_{\bm{\omega}}(x)$ is coded into an entropic code based on $p_{i}(x)$ [6]. ( $x$ is encoded into a code word whose length is $\ln p_{i}^{-1}(x)$ .) In addition, extra length in comparison with the case where $x$ is naturally coded into the entropic code based on $P^{(ME)}_{\bm{\omega}}(x)$ is given as

[TABLE]

that is Kullback-Leibler distance between $P^{(ME)}_{\bm{\omega}}(x)$ and $p_{i}(x)$ . Thus, choosing the maximum entropy state with respect to $\tau_{j}$ implies minimizing $\Delta\tau_{j}$ , so as any extra conditions except for $\tau_{j}$ can be excluded in constructing $P^{(ME)}_{\bm{\omega}}(x)$ . (The idea may be similar to a situation where an average drawing is employed except for particularly suggested marks by witnesses in making a facial composite sketch, or principle of the Occam’s razor[7].)

Kullback-Leibler distance[8] between $\mu(x)$ and $P^{(ME)}_{\bm{\omega}}(x)$ , i.e.,

[TABLE]

can be adopted as a metric in choosing parameter $\bm{\omega}$ . In other words, our aim is to find $\bm{\omega}$ minimizing $D(\omega)$ , or equivalently minimizing

[TABLE]

for the (unknown) target probability distribution $\mu(x)$ .

2 Mapping to Ground State Search

In the following, we investigate a way to find $\bm{\omega}$ as a solution of the ground state search of a Hamiltonian. We assume that a mixed state

[TABLE]

is physically available although it does not mean that we know $\hat{\mu}$ . Roughly speaking, our aim is to construct Hamiltonian $\hat{H}$ such that

[TABLE]

with $E(\bm{\omega})$ given in eq.(9). With the formulation, a ground state of $\hat{H}$ obviously gives the solution $\omega$ minimizing eq.(9). In the following, we assume that $\bm{\omega}=\{\omega_{i}\}_{i\in{\mathcal{C}}}$ is discrete, i.e.,

[TABLE]

with a finite set $\Omega_{i}$ , although $\omega_{i}$ was supposed continuous in the previous section. By appropriately choosing $\Omega_{i}$ , we can obtain an approximate solution that is practically enough in most cases.

Rewriting $P^{(ME)}_{\bm{\omega}}(x)$ in eq.(3) as

[TABLE]

with

[TABLE]

substituting it into eq.(9), we obtain

[TABLE]

Setting the constraint described in eq.(14) aside, let us introduce a parameter $\lambda\in\Lambda$ and a function

[TABLE]

Similar to eq.(11), we can introduce a Hamiltonian and its eigenstates as

[TABLE]

Notice, however, that the ground state of $\hat{H}^{\prime}$ does not present $\bm{\omega}$ minimizing $E(\bm{\omega})$ in eq.(9), unlike the rough idea illustrated in eq.(11). In fact, $E^{\prime}(\lambda,\bm{\omega})$ is a monotonic decreasing function with respect to $\lambda$ , and is not lower bounded. To address this point, adding a constraint term to $E^{\prime}(\lambda,\bm{\omega})$ , we introduce

[TABLE]

with $\alpha>0$ . By constructing a Hamiltonian such as

[TABLE]

we can safely reduce the context mixing problem to the ground state search of the Hamiltonian. To construct it, we consider a grand Hamiltonian

[TABLE]

where

[TABLE]

with state vectors

[TABLE]

Notice that the full Hilbert space where $\hat{G}$ lives is

[TABLE]

where

[TABLE]

Now, let us suppose that a state on ${\mathcal{H}}_{{\mathcal{X}}_{0}}\otimes{\mathcal{H}}_{{\mathcal{X}}_{1}}\otimes{\mathcal{H}}_{{\mathcal{X}}_{2}}$ is somehow fixed into a state

[TABLE]

where $\hat{\mu}$ is given in eq.(10), and $I/|{\mathcal{X}}|$ is the complete mixed state on each Hilbert space. In this case, $\hat{G}$ in eq.(20) behaves on partial Hilbert space ${\mathcal{H}}_{\Lambda}\otimes{\mathcal{H}}_{\Omega}$ as an effective Hamiltonian described as

[TABLE]

that is nothing but $\hat{H}$ in eq.(19).

Let us consider an application of $\hat{H}_{eff}$ to the quantum annealing computation[9, 10, 11, 12, 13] with a driving Hamiltonian $\hat{V}$ on ${\mathcal{H}}_{\Lambda}\otimes{\mathcal{H}}_{\Omega}$ , introducing a time dependent Hamiltonian such as

[TABLE]

with initial condition

[TABLE]

that is chosen to be the ground state of $\hat{V}$ . As a dynamics on the grand Hilbert space in eq.(23), the above is realized by introducing a time dependent Hamiltonian such as

[TABLE]

where $\hat{I}$ is the identity operator on ${\mathcal{H}}_{{\mathcal{X}}_{0}}\otimes{\mathcal{H}}_{{\mathcal{X}}_{1}}\otimes{\mathcal{H}}_{{\mathcal{X}}_{2}}$ , with the initial state

[TABLE]

Notice that the state evolution by $\hat{G}(t)$ does not generally preserve the product form in eq.(30). Moreover, since ${\rm tr}_{{\mathcal{H}}_{{\mathcal{X}}_{0}},{\mathcal{H}}_{{\mathcal{X}}_{1}},{\mathcal{H}}_{{\mathcal{X}}_{2}}}\left(\hat{G}^{2}\right)$ is not $\hat{H}_{eff}^{2}$ on Hilbert space ${\mathcal{H}}_{\Lambda}\otimes{\mathcal{H}}_{\Omega}$ , the dynamics of the reduced state into the Hilbert space governed by $\hat{G}(t)$ deviates from the dynamics

[TABLE]

in the order of $O(\delta t^{2})$ for small time interval $\delta t$ . To resolve this point, we need to supply new $\hat{\nu}$ one after another after every short time evolutions by time interval $\delta t$ . By this procedure, we can simulate the dynamics in eq.(31) for a finite time interval with error up to the order of $O(\delta t)$ .

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Debra A. Hirschberg, Daniel S.and Lelewer. Context Modeling for Text Compression . Springer US, Boston, MA, 1992.
2[2] Matthew V. Mahoney. Adaptive weighing of context models for lossless data compression. In Florida Tech. Technical Report, CS-2005-16 , 2005.
3[3] M Külekci. Compressed context modeling for text compression. In Data Compression Conference Proceedings , pages 373–382, 05 2011.
4[4] E. T. Jaynes. Information theory and statistical mechanics. Phys. Rev. , 106:620–630, May 1957.
5[5] E.T. Jaynes and James H. Justice. Monkeys, Kangaroos, and N . Cambridge University Press, 1986.
6[6] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal , 27(4):623–656, Oct 1948.
7[7] Bertrand Russell. A history of western philosophy / Bertrand Russell . Simon & Schuster New York, 1972.
8[8] S. Kullback and R. A. Leibler. On information and sufficiency. Ann. Math. Statist. , 22(1):79–86, 03 1951.