Context Mixing via Ground State Search
Kentaro Imafuku

TL;DR
This paper introduces a novel approach to the context mixing problem by formulating it as a ground state search using an effective Hamiltonian, aiming to optimize the combination of prior distributions for better modeling.
Contribution
It proposes a new Hamiltonian-based method for context mixing that approximates the target distribution by finding its ground state.
Findings
Effective Hamiltonian successfully models context mixing.
Ground state search improves distribution approximation.
Method outperforms traditional mixing techniques.
Abstract
To address context mixing problem via ground state search, we introduce an effective Hamiltonian whose ground state presents the best mixing of a prior given probability distributions to approximately describe unknown target probability distribution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Bayesian Modeling and Causal Inference · DNA and Biological Computing
Context Mixing via Ground State Search111This work is based on results from a project commissioned by the new Energy and Industrial Technology Development Organization (NEDO), Japan
Kentaro Imafuku
National Institute of Advanced Industrial Science and Technology (AIST)
Aomi 2-3-26, Koto-ku, Tokyo 1350064, Japan
Abstract
To address context mixing problem via ground state search, we introduce an effective Hamiltonian whose ground state presents the best mixing of a prior given probability distributions to approximately describe unknown target probability distribution.
1 Context Mixing
Context mixing [1, 2, 3] is a way to estimate an unknown probability distribution by appropriately mixing a prior given probability distributions. Let
[TABLE]
be the given probability distributions where indicates a condition that gives context defining . For simplicity, we restrict ourselves to the case where is discrete. Most intuitive way would be introducing a linear combination as
[TABLE]
with , and . Under the parameterization, context mixing takes form of finding parameter such that becomes a “good” approximation of a target probability distribution . Besides the parameterization in eq.(2), another parameterization as
[TABLE]
with
[TABLE]
is also often considered[2]. The form of parameterization in eq.(3) which is derived based on the maximum entropy principle[4, 5] guarantees that is the probability distribution with the maximum entropy among others which have the same expectation values with
[TABLE]
Notice that corresponds to the mean code length in the case where following is coded into an entropic code based on [6]. ( is encoded into a code word whose length is .) In addition, extra length in comparison with the case where is naturally coded into the entropic code based on is given as
[TABLE]
that is Kullback-Leibler distance between and . Thus, choosing the maximum entropy state with respect to implies minimizing , so as any extra conditions except for can be excluded in constructing . (The idea may be similar to a situation where an average drawing is employed except for particularly suggested marks by witnesses in making a facial composite sketch, or principle of the Occam’s razor[7].)
Kullback-Leibler distance[8] between and , i.e.,
[TABLE]
can be adopted as a metric in choosing parameter . In other words, our aim is to find minimizing , or equivalently minimizing
[TABLE]
for the (unknown) target probability distribution .
2 Mapping to Ground State Search
In the following, we investigate a way to find as a solution of the ground state search of a Hamiltonian. We assume that a mixed state
[TABLE]
is physically available although it does not mean that we know . Roughly speaking, our aim is to construct Hamiltonian such that
[TABLE]
with given in eq.(9). With the formulation, a ground state of obviously gives the solution minimizing eq.(9). In the following, we assume that is discrete, i.e.,
[TABLE]
with a finite set , although was supposed continuous in the previous section. By appropriately choosing , we can obtain an approximate solution that is practically enough in most cases.
Rewriting in eq.(3) as
[TABLE]
with
[TABLE]
substituting it into eq.(9), we obtain
[TABLE]
Setting the constraint described in eq.(14) aside, let us introduce a parameter and a function
[TABLE]
Similar to eq.(11), we can introduce a Hamiltonian and its eigenstates as
[TABLE]
Notice, however, that the ground state of does not present minimizing in eq.(9), unlike the rough idea illustrated in eq.(11). In fact, is a monotonic decreasing function with respect to , and is not lower bounded. To address this point, adding a constraint term to , we introduce
[TABLE]
with . By constructing a Hamiltonian such as
[TABLE]
we can safely reduce the context mixing problem to the ground state search of the Hamiltonian. To construct it, we consider a grand Hamiltonian
[TABLE]
where
[TABLE]
with state vectors
[TABLE]
Notice that the full Hilbert space where lives is
[TABLE]
where
[TABLE]
Now, let us suppose that a state on is somehow fixed into a state
[TABLE]
where is given in eq.(10), and is the complete mixed state on each Hilbert space. In this case, in eq.(20) behaves on partial Hilbert space as an effective Hamiltonian described as
[TABLE]
that is nothing but in eq.(19).
Let us consider an application of to the quantum annealing computation[9, 10, 11, 12, 13] with a driving Hamiltonian on , introducing a time dependent Hamiltonian such as
[TABLE]
with initial condition
[TABLE]
that is chosen to be the ground state of . As a dynamics on the grand Hilbert space in eq.(23), the above is realized by introducing a time dependent Hamiltonian such as
[TABLE]
where is the identity operator on , with the initial state
[TABLE]
Notice that the state evolution by does not generally preserve the product form in eq.(30). Moreover, since is not on Hilbert space , the dynamics of the reduced state into the Hilbert space governed by deviates from the dynamics
[TABLE]
in the order of for small time interval . To resolve this point, we need to supply new one after another after every short time evolutions by time interval . By this procedure, we can simulate the dynamics in eq.(31) for a finite time interval with error up to the order of .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Debra A. Hirschberg, Daniel S.and Lelewer. Context Modeling for Text Compression . Springer US, Boston, MA, 1992.
- 2[2] Matthew V. Mahoney. Adaptive weighing of context models for lossless data compression. In Florida Tech. Technical Report, CS-2005-16 , 2005.
- 3[3] M Külekci. Compressed context modeling for text compression. In Data Compression Conference Proceedings , pages 373–382, 05 2011.
- 4[4] E. T. Jaynes. Information theory and statistical mechanics. Phys. Rev. , 106:620–630, May 1957.
- 5[5] E.T. Jaynes and James H. Justice. Monkeys, Kangaroos, and N . Cambridge University Press, 1986.
- 6[6] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal , 27(4):623–656, Oct 1948.
- 7[7] Bertrand Russell. A history of western philosophy / Bertrand Russell . Simon & Schuster New York, 1972.
- 8[8] S. Kullback and R. A. Leibler. On information and sufficiency. Ann. Math. Statist. , 22(1):79–86, 03 1951.
