Computational Complexity of Queries Based on Itemsets

Nikolaj Tatti

arXiv:1902.00633·cs.CC·February 5, 2019

Computational Complexity of Queries Based on Itemsets

Nikolaj Tatti

PDF

TL;DR

This paper explores the computational difficulty of determining exact frequency bounds of itemset conjunctions, revealing that key problems are NP-complete or PP-hard, indicating significant intractability in this area.

Contribution

It establishes the NP-completeness and PP-hardness of fundamental query evaluation problems related to itemset frequencies, highlighting their computational intractability.

Findings

01

Checking maximal consistent frequency is NP-complete

02

Evaluating Maximum Entropy estimate is PP-hard

03

Checking consistency is NP-complete

Abstract

We investigate determining the exact bounds of the frequencies of conjunctions based on frequent sets. Our scenario is an important special case of some general probabilistic logic problems that are known to be intractable. We show that despite the limitations our problems are also intractable, namely, we show that checking whether the maximal consistent frequency of a query is larger than a given threshold is NP-complete and that evaluating the Maximum Entropy estimate of a query is PP-hard. We also prove that checking consistency is NP-complete.

Equations16

p (C = t) = p (U = 1, W = 0) = p (U = 1) - p (U = 1, ⋁ w_{i} = 1) .

p (C = t) = p (U = 1, W = 0) = p (U = 1) - p (U = 1, ⋁ w_{i} = 1) .

p (U = 1, ⋁ w_{i} = 1) = H \in H \sum (- 1)^{∣ H ∣ + 1} p (U = 1, H = 1) .

p (U = 1, ⋁ w_{i} = 1) = H \in H \sum (- 1)^{∣ H ∣ + 1} p (U = 1, H = 1) .

p(V=t,W=u)=\left\{\begin{array}[]{ll}2^{-L}&\text{if for all }i\text{ we have }u_{i}=C_{i}(t)\\ 0&\text{otherwise}.\end{array}\right.

p(V=t,W=u)=\left\{\begin{array}[]{ll}2^{-L}&\text{if for all }i\text{ we have }u_{i}=C_{i}(t)\\ 0&\text{otherwise}.\end{array}\right.

t, u \sum p (V = t, W = u) = t, u_{i} = C_{i} (t) \sum p (V = t, W = u) = 2^{L - 3} 2^{- L} = \frac{1}{8},

t, u \sum p (V = t, W = u) = t, u_{i} = C_{i} (t) \sum p (V = t, W = u) = 2^{L - 3} 2^{- L} = \frac{1}{8},

f = p (W = 1) \geq p (V = t, W = 1) = 2^{- L} > 0.

f = p (W = 1) \geq p (V = t, W = 1) = 2^{- L} > 0.

q (V = t, W = 1) > 0

q (V = t, W = 1) > 0

\mathcal{F}=\left\{\begin{array}[]{l}\emptyset\left(1\right),v_{1}\left({\textstyle\frac{1}{2}}\right),v_{2}\left({\textstyle\frac{1}{2}}\right),v_{3}\left({\textstyle\frac{1}{2}}\right),v_{1}v_{2}\left({\textstyle\frac{1}{4}}\right),v_{2}v_{3}\left({\textstyle\frac{1}{4}}\right),\\ c_{1}\left(\frac{3}{4}\right),v_{1}c_{1}\left({\textstyle\frac{1}{2}}\right),v_{2}c_{1}\left({\textstyle\frac{1}{2}}\right),v_{1}v_{2}c_{1}\left({\textstyle\frac{1}{4}}\right),\\ c_{2}\left(\frac{3}{4}\right),v_{2}c_{2}\left({\textstyle\frac{1}{4}}\right),v_{3}c_{2}\left({\textstyle\frac{1}{2}}\right),v_{2}v_{3}c_{2}\left({\textstyle\frac{1}{4}}\right)\end{array}\right\}.

\mathcal{F}=\left\{\begin{array}[]{l}\emptyset\left(1\right),v_{1}\left({\textstyle\frac{1}{2}}\right),v_{2}\left({\textstyle\frac{1}{2}}\right),v_{3}\left({\textstyle\frac{1}{2}}\right),v_{1}v_{2}\left({\textstyle\frac{1}{4}}\right),v_{2}v_{3}\left({\textstyle\frac{1}{4}}\right),\\ c_{1}\left(\frac{3}{4}\right),v_{1}c_{1}\left({\textstyle\frac{1}{2}}\right),v_{2}c_{1}\left({\textstyle\frac{1}{2}}\right),v_{1}v_{2}c_{1}\left({\textstyle\frac{1}{4}}\right),\\ c_{2}\left(\frac{3}{4}\right),v_{2}c_{2}\left({\textstyle\frac{1}{4}}\right),v_{3}c_{2}\left({\textstyle\frac{1}{2}}\right),v_{2}v_{3}c_{2}\left({\textstyle\frac{1}{4}}\right)\end{array}\right\}.

q (W = 1) \geq q (c_{0} = 1, W = 1) = q (c_{0} = 1) = 2^{- L} .

q (W = 1) \geq q (c_{0} = 1, W = 1) = q (c_{0} = 1) = 2^{- L} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Computational Complexity of Queries Based on Itemsets

Nikolaj Tatti

HIIT Basic Research Unit, Laboratory of Computer and Information Science, Helsinki University of Technology, Finland

Abstract

We investigate determining the exact bounds of the frequencies of conjunctions based on frequent sets. Our scenario is an important special case of some general probabilistic logic problems that are known to be intractable. We show that despite the limitations our problems are also intractable, namely, we show that checking whether the maximal consistent frequency of a query is larger than a given threshold is NP-complete and that evaluating the Maximum Entropy estimate of a query is PP-hard. We also prove that checking consistency is NP-complete.

keywords:

Computational Complexity, Data Mining, Itemset

1 Introduction

Assume that we have two events, say $a$ and $b$ . Assume further that their probabilities are $P(a)=0.6$ and $P(b)=0.5$ . What can we say about the probability of $a\land b$ ? We know that the probability must lie within $I=\left[0.1,0.5\right]$ . This interval is tight: For each $f\in I$ there is a distribution having $f$ as a probability of $a\land b$ . Also note that the Maximum Entropy estimate in this case is $0.6\times 0.5=0.3$ .

A more complicated example would be the following: Assume three events $a_{1}$ , $a_{2}$ , and $a_{3}$ . Assume that we know $P(a_{1})$ , $P(a_{2})$ , $P(a_{3})$ , $P(a_{1}\land a_{2})$ and $P(a_{1}\land a_{3})$ . What can we say about $P(a_{1}\land a_{2}\land a_{3})$ ?

Let us make these examples more general: A conjunctive query is a boolean formula having the form $a_{1}\land a_{2}\land\ldots\land a_{L}$ . Assume that we are given a set $\mathcal{F}$ of conjunctive queries along with their probabilities. Assume also that we are given a conjunctive query $B$ not belonging to $\mathcal{F}$ . What can we tell about the probability of this query? We know that the possible probabilities of the query $B$ correspond to some interval. In the paper we show that checking whether the right side of this interval is larger than some threshold is NP-complete. We also show that estimating the probability of the query $B$ using Maximum Entropy is PP-hard.

In the paper we adopt the terminology used in data mining of [math]– $1$ data: Conjunctive queries are represented by sets of items called itemsets and the probabilities of conjunctive queries are called itemset frequencies.

Our problems are special cases of much more general problems (see Section 6 for detailed comparison). These general problems are well-studied and they are all (at least) NP-hard. The difference is that in our work we concentrate on studying antimonotonic families of itemsets. We should point out that antimonotonic families are important since they tend to arise frequently in practice, for example, in mining of frequent itemsets [1, 2]. A similar technique is used in [7] to prove that inference of Belief Networks is NP-hard. The result of [7] is essentially Theorem 6 (in this paper) though it is in a different context. The general boolean query scenario is reduced to Linear Programming in [10]. A method worth mentioning is introduced in [15] where the authors estimate the frequencies using Maximum Entropy.

2 Preliminaries

In this section we give basic definitions used in mining of [math]– $1$ data.

By a binary data set we mean a collection of binary vectors of length $K$ sampled from some distribution. We define a sample space $\Omega=\left\{0,1\right\}^{K}$ to be the collection of all possible binary vectors of length $K$ . From now on $\Omega$ will always denote the sample space, $K$ will denote the dimension of binary vectors. Any distribution given in this paper will be defined on $\Omega$ .

It is custom to assign an attribute to each dimension of $\Omega$ . Thus, when we speak of $a_{i}$ we mean the $i$ th dimension. The set of all attributes is $A=\left\{a_{1},\ldots,a_{K}\right\}$ . An itemset is a subset of $A$ . Let $B=\left\{a_{i_{1}},\ldots,a_{i_{L}}\right\}$ be an itemset. We often use a condensed notation $B=a_{i_{1}}\cdots a_{i_{L}}$ . A family of itemsets is called antimonotonic if all the subsets of any member are also included.

Let $p$ be a distribution defined on $\Omega$ . We use the following notation: Let $B=a_{i_{1}}\cdots a_{i_{L}}$ be an itemset and let $t$ be a binary vector of length $L$ . Then we shorten the notation $p(a_{i_{1}}=t_{1},\ldots,a_{i_{L}}=t_{L})$ by $p(B=t)$ . By $p(B=1)$ we mean $p(B=t)$ , where $t$ contains only ones. The probability $p(B=1)$ is called the frequency of $B$ .

Assume a family $\left\{B_{1},\ldots,B_{N}\right\}$ of itemsets and a vector $\theta$ of length $N$ . We say that a distribution $p$ satisfies the frequencies if $\theta_{i}=p(B_{i}=1)$ for $i=1,\ldots,N$ . We say that these frequencies are consistent if there is a distribution satisfying them.

3 Maximal Frequency Query is NP-complete

Assume that we want to find the frequency for an itemset $B$ based on some known family $\mathcal{F}$ of itemsets. We know that generally the frequency for $B$ is not unique: There may be distributions that produce different frequencies for $B$ but have the same frequencies of $\mathcal{F}$ . The set of all the consistent frequencies of $B$ is an interval [4]. In this section we focus on finding one side of this interval:

Problem 1

(MaxQuery)* Assume that we are given an antimonotonic family $\mathcal{F}$ having $N$ members along with rational and consistent frequencies $\theta$ . Find the maximal frequency for a given itemset $B$ that can be produced by a distribution satisfying the frequencies $\theta$ .*

In other words, we ask ourselves that, if we know the frequencies $\theta$ , then what is the largest consistent frequency for $B$ . Note that the maximal frequency always exists since the frequencies $\theta$ are required to be consistent. Our goal in this section is to show that in general this problem is intractable. First let us give an example where the solution can be easily obtained.

Example 1

Assume that a family $\mathcal{F}$ contains only the itemsets of size one. Then the frequency $\theta_{a_{i}}$ is the mean of the attribute $a_{i}$ . The maximal frequency for an itemset $B=b_{1}b_{2}\cdots b_{M}$ is $\min\left\{\theta_{b_{i}}\mid i=1,\ldots,M\right\}$ .

We know that MaxQuery can be solved by using Linear Programming [4] though the resulting program contains an exponential number of variables. This reduction along with some results from Linear Programming theory [14] has important consequences: There is a distribution, say $q$ , producing the maximal frequency for B and having at most $N+1$ non-zero entries. Also, $q$ has rational entries, and if $L$ is the number of bits needed to specify the denominator of an element of the frequency vector $\theta$ , then the number of bits needed to specify the denominator of an entry of $q$ is $\log_{2}\left((N+1)^{3}2^{NL}\right)\in O(NL)$ . We call such a distribution canonical.

Since NP is defined for yes/no problems we need the decision version of MaxQuery:

Problem 2

(MaxQueryDec)* Assume that we are given an antimonotonic family $\mathcal{F}$ having $N$ members along with rational and consistent frequencies $\theta$ . Given an itemset $B$ and a rational threshold $b$ is there a distribution satisfying the frequencies $\theta$ such that the frequency of $B$ is larger than $b$ ?*

The relation between MaxQuery and MaxQueryDec is the following: Assume that we can solve MaxQuery in polynomial time, then we can clearly solve MaxQueryDec in polynomial time. Assume now that we can solve MaxQueryDec in polynomial time. Let $f$ be the solution of MaxQuery. We can find $f$ using MaxQueryDec and dichotomous search. We know that $f$ is a rational number between [math] and $1$ and that the denominator of $f$ can be expressed using $O(NL)$ bits. Thus the number of required search steps is $O(NL)$ .

Theorem 2

MaxQueryDec* is in NP.*

{@proof}

[Proof] Let $q$ be a canonical distribution for MaxQuery. We can represent this distribution in polynomial space, and hence we can use it as a certificate. To check the certificate we need to check that $q$ is a real distribution, that it satisfies the frequencies and that its frequency for $B$ is larger than the threshold $b$ .

Our next step is to reduce 3SAT to MaxQueryDec. In order to do that we need the following lemma:

Lemma 3

Assume that two distributions $p$ and $q$ satisfy the frequencies $\theta$ of an antimonotonic family $\mathcal{F}$ of itemsets. Let $C\in\mathcal{F}$ . Then $p(C=t)=q(C=t)$ for any binary vector $t$ .

{@proof}

[Proof] Fix $C=\left\{c_{1},\ldots,c_{N}\right\}$ and $t$ . Let $U=\left\{c_{i}\in C\mid t_{i}=1\right\}$ and let $W=C-U$ . Denote the elements of $W$ by $w_{i}$ . Let $p(U=1,\bigvee_{i}w_{i}=1)$ be the probability of $U$ being $1$ and at least one of $w_{i}$ being $1$ . We see that

[TABLE]

Let $\mathcal{H}=\left\{H\subseteq W\mid H\neq\emptyset\right\}$ be the collection of non-empty subsets of $W$ . We can express the last term of Eq. 1 by using the inclusion-exclusion principle

[TABLE]

By combining Eqs. 1 and 2 we have expressed $p(C=t)$ as a linear combination of terms having the form $p(B=1)$ where $B\subseteq C$ . Antimonotonicity implies that all these frequencies are included in $\theta$ . This makes $p(C=t)$ unique and the lemma follows.

Theorem 4

3SAT* is polynomial-time reducible to MaxQueryDec.*

{@proof}

[Proof] Let $R$ be an instance of 3SAT having $L$ variables and $M$ clauses. We set the dimension of the sample space to be $K=L+M$ . The first $L$ items correspond to the variables of $R$ and the last $M$ items correspond to the clauses. We use the following notation: Let $t$ be a truth assignment and let $C_{i}$ be a clause, then $C_{i}(t)$ is a function resulting $1$ , if $C_{i}$ is satisfied by $t$ , and [math] otherwise. We denote the first $L$ items by $v_{i}$ and the last $M$ items by $c_{i}$ . We also set $V=\left\{v_{1},\ldots,v_{L}\right\}$ and $W=\left\{c_{1},\ldots,c_{M}\right\}$ .

We will now define an antimonotonic family $\mathcal{F}$ of itemsets. Let $C_{i}$ be some clause and let $c_{i}$ be its corresponding item. Assume that the items corresponding to the variables in $C_{i}$ are $v_{1}$ , $v_{2}$ , and $v_{3}$ . We add an itemset $v_{1}v_{2}v_{3}c_{i}$ to the family $\mathcal{F}$ along with its subsets. We repeat this procedure to each clause in $R$ . The resulting family $\mathcal{F}$ contains $16M$ members at maximum.

The following step is to define the frequencies $\theta$ . In order to do this we define a distribution $p$ over the attributes to be

[TABLE]

That is, the first $L$ items are distributed uniformly and the values of the last $M$ items are set to correspond to the truth values of the clauses.

We define the frequencies $\theta_{i}=p(F_{i}=1)$ , where $F_{i}\in\mathcal{F}$ . We note that the frequencies are rational and consistent. There is a closed formula for evaluating these frequencies. For example, assume that we have a clause $C_{1}\equiv(v_{1}\lor v_{2}\lor v_{3})$ . The frequency of the itemset $v_{1}v_{2}v_{3}c_{1}$ is then

[TABLE]

where in the first summation $t$ ranges over truth assignments such that $t_{1}=t_{2}=t_{3}=1$ and $u$ ranges over binary vectors of length $M$ such that $u_{1}=1$ . In the second summation $t$ ranges similarly as in the first summation and $u$ is now set to correspond to the clauses. The frequencies for the other members of $\mathcal{F}$ can be deduced in a similar way. Thus we can obtain the frequencies $\theta$ in polynomial time.

Let $f$ be the maximal frequency for the itemset $W$ . We claim that the formula $R$ is satisfiable if and only if $f>0$ .

Assume that $R$ is satisfiable by a truth assignment, then we have

[TABLE]

Assume now that there is a distribution $q$ satisfying the frequencies and producing a positive frequency for $W$ . Let $t$ be a truth assignment not satisfying the formula, that is, there is a clause, say $C_{1}=(v_{1}\lor v_{2}\lor v_{3})$ , that is not satisfied. Define $G=v_{1}v_{2}v_{3}$ and $u=\left[t_{1},t_{2},t_{3}\right]$ . Lemma 3 implies that $q(V=t,W=1)\leq q(G=u,c_{1}=1)=p(G=u,c_{1}=1)=0$ . By reversing this property we get the following: If $t$ is such that

[TABLE]

holds, then $t$ must satisfy $R$ .

By the assumption $q(W=1)>0$ so there exists a truth assignment $t$ such that Eq. 3 holds. Thus $R$ is satisfiable. The reduction is complete if we set the query $B=W$ and the threshold $b=0$ .

Example 5

Consider the formula $(v_{1}\lor v_{2})\land(\neg v_{2}\lor v_{3})$ . We have two clauses, $C_{1}$ and $C_{2}$ , and three variables, $v_{1}$ , $v_{2}$ , and $v_{3}$ . The itemset family along with its frequencies (given in parenthesises) is

[TABLE]

The maximal frequency of $c_{1}c_{2}$ for this setup (solved by linear programming) is ${\textstyle\frac{1}{2}}$ . Clearly, the formula is satisfiable.

4 MaxEnt Frequency Query is PP-hard

In the previous section we showed that searching for the maximal frequencies is a very hard problem. The maximal frequencies, however, are not so useful if our goal is to estimate boolean queries from a given set of itemsets. A much more useful approach is to use Maximum Entropy approach. Given a distribution $p$ defined on $\Omega$ , the entropy of $p$ is $\mathcal{E}\left(p\right)=-\sum_{\omega\in\Omega}p(\omega)\log\left(p(\omega)\right)$ . It is custom to define $0\log(0)=0$ so that $\mathcal{E}\left(p\right)$ is always defined.

Problem 3

(EntrQuery)* Assume that we are given an antimonotonic family $\mathcal{F}$ having $N$ members along with rational and consistent frequencies $\theta$ . Find a frequency for a given itemset $B$ produced by the distribution $p$ satisfying the frequencies $\theta$ and maximising the entropy $\mathcal{E}\left(p\right)$ .*

It has been empirically shown that EntrQuery results in a good approximation [15].

Again we need a decision version of the problem:

Problem 4

(EntrQueryDec)* Assume that we are given an antimonotonic family $\mathcal{F}$ having $N$ members along with rational and consistent frequencies $\theta$ . Let $f$ be a frequency for a given itemset $B$ produced by a distribution satisfying the frequencies $\theta$ and maximising entropy. Is $f$ larger than a given rational threshold $b$ ?*

The following theorem shows that EntrQueryDec is NP-hard.

Theorem 6

3SAT* is polynomial-time reducible to EntrQueryDec.*

{@proof}

[Proof] Let $R$ be an instance of 3SAT. Let $\mathcal{F}$ , $\theta$ , $V$ and $B$ be the same as in the proof of Theorem 4. Let $\mathbb{P}$ be the set of distributions satisfying the frequencies $\theta$ . Let $q\in\mathbb{P}$ . A marginal distribution $q_{V}$ is obtained from $q$ by keeping only the items included in $V$ . The distribution $q$ has the following property: The items corresponding to the clauses are completely determined by the items corresponding to the variables. This implies that the entropy of $\mathcal{E}\left(q\right)=\mathcal{E}\left(q_{V}\right)$ [11, Theorem 4.2].

Let $\hat{q}\in\mathbb{P}$ be the distribution maximising the entropy. Let $p\in\mathbb{P}$ be the distribution defined in the proof of Theorem 4. Note that $\mathcal{E}\left(\hat{q}_{V}\right)=\mathcal{E}\left(\hat{q}\right)\geq\mathcal{E}\left(p\right)=\mathcal{E}\left(p_{V}\right)$ . We know that there is no distribution that has larger entropy than the uniform distribution [11, Theorem 3.1]. Since $p_{V}$ is uniform, we must have $\mathcal{E}\left(\hat{q}_{V}\right)=\mathcal{E}\left(p_{V}\right)$ . Hence $\mathcal{E}\left(\hat{q}\right)=\mathcal{E}\left(p\right)$ . We also know that the distribution maximising entropy is unique [8, Theorem 3.1]. This implies that $\hat{q}=p$ . To complete the proof we note that $p$ produces a positive frequency for $B$ if and only if $R$ is satisfiable.

A problem P is in PP if there is a machine such that an input $x$ is a yes-instance of P iff more than half of the computation paths end up accepting [13]. The class PP is (believed to be) larger than NP. We can show that EntrQueryDec is PP-hard: In the proof the frequency of $B$ is exactly the number of satisfying assignments divided by $2^{-L}$ . Hence, if we set the threshold $b=2^{-L/2}$ , the instance will be in EntrQueryDec iff the square root of the number of assignments satisfy the given 3SAT formula. This problem is known to be PP-complete [3].

5 Checking Consistency is NP-complete

So far we have assumed that the itemset frequencies given in our problems are consistent. Let us remove this constraint and consider the following problem.

Problem 5

(Consistent)* Assume that we are given an antimonotonic family $\mathcal{F}$ having $N$ members along with rational frequencies $\theta$ . Are the frequencies $\theta$ consistent?*

The following theorem proves that Consistent is a very hard problem.

Theorem 7

Consistent* is NP-complete.*

{@proof}

[Proof] First, we need to show that Consistent is in NP. We know from Linear Programming theory that if the frequencies are valid then there is a canonical distribution satisfying the frequencies. This is our certificate and thus Consistent is in NP.

We now prove that 3SAT is polynomial-time reducible to Consistent. We use the same construction as in the proof of Theorem 4 with some additions: We add one special attribute, say $c_{0}$ , to the set of attributes. We add an itemset $c_{0}$ to $\mathcal{F}$ , and we also add itemsets having the form $c_{0}c_{i}$ to $\mathcal{F}$ . The frequencies for the new itemsets are set to be $2^{-L}$ , where $L$ is the number of variables appearing in the 3SAT instance $R$ .

Assume that $R$ is satisfiable by a truth assignment $t$ . We define a distribution $q$ by extending the distribution $p$ to $c_{0}$ . The extension is done such that $c_{0}$ is $1$ iff $V=t$ . Clearly, $q$ satisfies the frequencies.

To prove the other direction, assume that there exists a distribution, say $q$ , that satisfies the frequencies. To prove that $R$ is satisfiable we must prove that $q(W=1)>0$ . Select two attributes, say $c_{1}$ and $c_{2}$ . Note that $q(c_{0}=1,c_{1}=0)=0$ and $q(c_{0}=1,c_{2}=0)=0$ . This implies that $q(c_{0}=1)=q(c_{0}=1,c_{1}=1,c_{2}=1)$ . We can prove in an iterative fashion that

[TABLE]

This proves the result.

6 Connections to Related Work

An NP-complete problem called FreqSat introduced in [5, 6] is a generalisation of Consistent — in FreqSat we are allowed to have non-antimonotonic families and inequality constraints. We can transform MaxQueryDec into FreqSat by changing the query into an inequality constraint. We should also point out that the proof of NP-hardness of FreqSat given in [5] is (although not explicitly mentioned) actually a valid proof for Consistent.

An even more general scenario is introduced in [12] in which we are allowed to have conditional first-order logic sentences as constraints/queries. This scenario can be emulated by itemsets [6]. Also, a famous problem called PSat in which we are given a CNF-formula, a frequency for each clause, and we are asked whether there is a distribution satisfying the frequencies is known to be NP-complete [9].

7 Conclusions

In this paper we studied certain boolean query problems. Our problems were specialised (but frequently occurring and thus important) problems of much general scenarios and we showed that despite the limitations our problems remained intractable. The crux of the paper lies within the construction in the proof of Theorem 4.

There are some open problems: For example, what is the exact complexity of MaxQuery? Is it FNP-complete or FP ${}^{\text{NP}}$ -complete? Also, what is the complexity of the opposite problem MinQuery? In addition, it is worthwhile to study the conditions under which the boolean query problems can be solved efficiently.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large databases. In Peter Buneman and Sushil Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data , pages 207–216, Washington, D.C., 26–28 1993.
2[2] Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, and Aino Inkeri Verkamo. Fast discovery of association rules. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining , pages 307–328. AAAI Press/The MIT Press, 1996.
3[3] Delbert D. Bailey, Victor Dalmau, and Phokion G. Kolaitis. Phase transitions of PP-complete satisfiability problems. In IJCAI , pages 183–192, 2001.
4[4] Artur Bykowski, Jouni K. Seppänen, and Jaakko Hollmén. Model-independent bounding of the supports of Boolean formulae in binary data. In Pier Luca Lanzi and Rosa Meo, editors, Database technologies for data mining . Springer Verlag, 2003.
5[5] Toon Calders. Axiomatization and Deduction Rules for the Frequency of Itemsets . Ph D thesis, University of Antwerp, Belgium, 2003.
6[6] Toon Calders. Computational complexity of itemset frequency satisfiability. In Proceedings of the 23nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database System , 2004.
7[7] Gregory Cooper. The computational complexity of probabilistic inference using bayesian belief networks. Artificial Intelligence , 42(2–3):393–405, Mar. 1990.
8[8] I. Csiszár. I-divergence geometry of probability distributions and minimization problems. The Annals of Probability , 3(1):146–158, Feb. 1975.