Functional Aggregate Queries with Additive Inequalities

Mahmoud Abo Khamis; Ryan R. Curtin; Benjamin Moseley; Hung Q. Ngo,; XuanLong Nguyen; Dan Olteanu; Maximilian Schleich

arXiv:1812.09526·cs.DB·September 16, 2020

Functional Aggregate Queries with Additive Inequalities

Mahmoud Abo Khamis, Ryan R. Curtin, Benjamin Moseley, Hung Q. Ngo,, XuanLong Nguyen, Dan Olteanu, Maximilian Schleich

PDF

TL;DR

This paper introduces new algorithms and width parameters for efficiently answering functional aggregate queries with additive inequalities, with applications to machine learning tasks, improving over existing solutions.

Contribution

It defines relaxed width parameters and algorithms for FAQ-AI, extending prior work and enabling faster solutions for complex database queries with inequalities.

Findings

01

New width parameters for FAQ-AI with additive inequalities.

02

Algorithms achieving lower complexity than previous methods.

03

Applications to machine learning tasks like clustering and SVM training.

Abstract

Motivated by fundamental applications in databases and relational machine learning, we formulate and study the problem of answering functional aggregate queries (FAQ) in which some of the input factors are defined by a collection of additive inequalities between variables. We refer to these queries as FAQ-AI for short. To answer FAQ-AI in the Boolean semiring, we define relaxed tree decompositions and relaxed submodular and fractional hypertree width parameters. We show that an extension of the InsideOut algorithm using Chazelle's geometric data structure for solving the semigroup range search problem can answer Boolean FAQ-AI in time given by these new width parameters. This new algorithm achieves lower complexity than known solutions for FAQ-AI. It also recovers some known results in database query answering. Our second contribution is a relaxation of the set of polymatroids that…

Equations469

(G_{1}, \dots, G_{k}) min i = 1 \sum k x \in G_{i} \sum ∥ x - μ_{i} ∥_{2}^{2} .

(G_{1}, \dots, G_{k}) min i = 1 \sum k x \in G_{i} \sum ∥ x - μ_{i} ∥_{2}^{2} .

c_{ij} (x)

c_{ij} (x)

Q_{1}^{(i, ℓ)} (\sum x_{ℓ}) \leftarrow p \in [m] ⋀ R_{p} (x_{S_{p}}) \land j \in [k] ⋀ c_{ij} (x) \leq 0 .

Q_{1}^{(i, ℓ)} (\sum x_{ℓ}) \leftarrow p \in [m] ⋀ R_{p} (x_{S_{p}}) \land j \in [k] ⋀ c_{ij} (x) \leq 0 .

Q_{2} () \leftarrow R (a, b) \land S (b, c) \land T (c, d) \land a \leq d,

Q_{2} () \leftarrow R (a, b) \land S (b, c) \land T (c, d) \land a \leq d,

Q (x_{F})

Q (x_{F})

Q (x_{F})

Q (x_{F})

Q_{1}^{(i, ℓ)} ()

Q_{1}^{(i, ℓ)} ()

Q_{2} ()

:= {h ∣ h : 2^{V} \to R_{+}, h (S) \leq 1, \forall S \in E} .

:= {h ∣ h : 2^{V} \to R_{+}, h (S) \leq 1, \forall S \in E} .

fhtw (H)

fhtw (H)

subw (H)

fhtw (H) := (T, χ) \in TD min t \in V (T) max ρ_{E}^{*} (χ (t)),

fhtw (H) := (T, χ) \in TD min t \in V (T) max ρ_{E}^{*} (χ (t)),

t \in V (T) max h \in ED \cap Γ_{n} max h (χ (t)) = t \in V (T) max ρ_{E}^{*} (χ (t)),

t \in V (T) max h \in ED \cap Γ_{n} max h (χ (t)) = t \in V (T) max ρ_{E}^{*} (χ (t)),

{χ^{'} (t) ∣ t \in V (T^{'})} \subseteq {χ (t) ∣ t \in V (T)} .

{χ^{'} (t) ∣ t \in V (T^{'})} \subseteq {χ (t) ∣ t \in V (T)} .

ED_{\neq \infty}

ED_{\neq \infty}

faqw (Q)

(remark \ref rmk:alternate-fhtw)

Q (x_{F})

Q (x_{F})

smfw (Q)

smfw (Q)

Q (x_{F}) = x_{V ∖ F} ⨁ Φ_{1} (x_{U}) \otimes Φ_{2} (x_{L}) \otimes (S \in E_{ℓ} ⨂ 1_{\sum_{v \in S} θ_{v}^{S} (x_{v}) \leq 0}),

Q (x_{F}) = x_{V ∖ F} ⨁ Φ_{1} (x_{U}) \otimes Φ_{2} (x_{L}) \otimes (S \in E_{ℓ} ⨂ 1_{\sum_{v \in S} θ_{v}^{S} (x_{v}) \leq 0}),

Q (x_{F}) = x_{U ∖ F} ⨁ Q (x_{U}) .

Q (x_{F}) = x_{U ∖ F} ⨁ Q (x_{U}) .

θ_{T}^{S} (x_{T})

θ_{T}^{S} (x_{T})

x_{L ∖ U} ⨁ Φ_{1} (x_{U}) \otimes Φ_{2} (x_{L}) \otimes (S \in E_{ℓ} ⨂ 1_{\sum_{v \in S} θ_{v}^{S} (x_{v}) \leq 0}) =

x_{L ∖ U} ⨁ Φ_{1} (x_{U}) \otimes Φ_{2} (x_{L}) \otimes (S \in E_{ℓ} ⨂ 1_{\sum_{v \in S} θ_{v}^{S} (x_{v}) \leq 0}) =

Φ_{1} (x_{U}) \otimes x_{L ∖ U} ⨁ Φ_{2} (x_{L}) \otimes (S \in E_{ℓ} ⨂ 1_{θ_{S \cap U}^{S} (x_{S \cap U}) \leq - θ_{S ∖ U}^{S} (x_{S ∖ U})}) .

q (x_{U})

q (x_{U})

p (x_{L})

x_{L ∖ U} ⨁ Φ_{2} (x_{L}) \otimes (S \in E_{ℓ} ⨂ 1_{θ_{S \cap U}^{S} (x_{S \cap U}) \leq - θ_{S ∖ U}^{S} (x_{S ∖ U})})

x_{L ∖ U} ⨁ Φ_{2} (x_{L}) \otimes (S \in E_{ℓ} ⨂ 1_{θ_{S \cap U}^{S} (x_{S \cap U}) \leq - θ_{S ∖ U}^{S} (x_{S ∖ U})})

= x_{L ∖ U} ⨁ 1_{q (x_{U}) ⪯ p (x_{L})} \otimes Φ_{2} (x_{L}) .

faqw_{ℓ} (Q)

faqw_{ℓ} (Q)

fhtw_{ℓ} (Q)

fhtw_{ℓ} (Q)

faqw_{ℓ} (Q) = h \in ED_{\neq \infty} \cap Γ_{n} max t \in V (T) max h (χ (t)) .

faqw_{ℓ} (Q) = h \in ED_{\neq \infty} \cap Γ_{n} max t \in V (T) max h (χ (t)) .

Q ()

Q ()

= x_{V} ⨁ t \in V (T) ⨂ Φ_{t} (x_{χ (t)}) \otimes (S \in E_{ℓ} ⨂ 1_{\sum_{v \in S} θ_{v}^{S} (x_{v}) \leq 0}) .

π_{K, I} (x_{J})

π_{K, I} (x_{J})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Functional Aggregate Queries with Additive Inequalities

Mahmoud Abo Khamis

relationalAI

Ryan R. Curtin

relationalAI

Benjamin Moseley

Carnegie Mellon University

Hung Q. Ngo

relationalAI

XuanLong Nguyen

University of Michigan

Dan Olteanu

University of Zurich

Maximilian Schleich

University of Washington

Abstract

Motivated by fundamental applications in databases and relational machine learning, we formulate and study the problem of answering functional aggregate queries (FAQ) in which some of the input factors are defined by a collection of additive inequalities between variables. We refer to these queries as FAQ-AI for short.

To answer FAQ-AI in the Boolean semiring, we define relaxed tree decompositions and relaxed submodular and fractional hypertree width parameters. We show that an extension of the InsideOut algorithm using Chazelle’s geometric data structure for solving the semigroup range search problem can answer Boolean FAQ-AI in time given by these new width parameters. This new algorithm achieves lower complexity than known solutions for FAQ-AI. It also recovers some known results in database query answering.

Our second contribution is a relaxation of the set of polymatroids that gives rise to the counting version of the submodular width, denoted by #subw. This new width is sandwiched between the submodular and the fractional hypertree widths. Any FAQ and FAQ-AI over one semiring can be answered in time proportional to #subw and respectively to the relaxed version of #subw.

We present three applications of our FAQ-AI framework to relational machine learning: $k$ -means clustering, training linear support vector machines, and training models using non-polynomial loss. These optimization problems can be solved over a database asymptotically faster than computing the join of the database relations.

1 Introduction

In this article we consider the problem of computing functional aggregate queries with additive inequalities, or FAQ-AI queries for short. Although existing algorithms such as InsideOut [6, 5] and PANDA [9, 8] are able to evaluate FAQ-AI queries, they do not exploit the structure of the additive inequalities. We introduce variants of these algorithms to this effect. Whereas the prior algorithms work on hypertree decompositions of the queries, our new algorithms work on relaxations of these decompositions to achieve lower computational complexities than InsideOut and PANDA.

Functional aggregate queries with additive inequalities can express computation needed for various database workloads and supervised and unsupervised machine learning.

On the database side, queries with inequalities occur naturally in scenarios involving temporal and spatial relationships between objects in databases. In a retail scenario (e.g., TPC-H), we would like to compute the revenue generated by a customer’s orders whose dates closely precede the ship dates of their lineitems. In streaming scenarios, we would like to detect patterns of events whose time stamps follow a particular order [16]. In spatial data management scenarios, we would like to retrieve objects whose coordinates are within a multi-dimensional range or in close proximity of other objects [31]. The evaluation of Core XPath queries over XML documents amounts to the evaluation of conjunctive queries with inequalities expressing tree relationships in the pre/post plane [20].

For machine learning, we show that FAQ-AI can express computation needed for $k$ -means clustering, training linear support vector machines, and training models using non-polynomial loss. These optimization problems can be solved over a database asymptotically faster than computing the join of the database relations.

1.1 Motivating examples

A key insight of this article is that the efficient computation of inequality joins can reduce the computational complexity of supervised and unsupervised machine learning.

Example 1.1.

The $k$ -means algorithm divides the input dataset $\bm{G}$ into $k$ clusters of similar data points [24]. Each cluster $\bm{G}_{i}$ has a mean $\bm{\mu}_{i}\in\mathbb{R}^{n}$ , which is chosen according to the following optimization (similarity is defined here with respect to the $\ell_{2}$ norm):

[TABLE]

Let $\mu_{i,\ell}$ be the $\ell$ ’th component of mean vector $\bm{\mu}_{i}$ . For a data point $\bm{x}\in\bm{G}$ , the function $c_{ij}$ computes the difference between the squares of the $\ell_{2}$ -distances from $\bm{x}$ to $\bm{\mu}_{i}$ and from $\bm{x}$ to $\bm{\mu}_{j}$ :

[TABLE]

A data point $\bm{x}\in\bm{G}$ is closest to mean $\bm{\mu}_{i}$ from the set of $k$ means iff $\forall j\in[k]:c_{ij}(\bm{x})\leq 0$ .

To compute the mean vector $\bm{\mu}_{i}$ , we need to compute the sum of values for each dimension $\ell\in[n]$ over $\bm{G}_{i}:\sum_{\bm{x}\in\bm{G}_{i}}x_{\ell}$ . If the dataset $\bm{G}$ is the join of database relations $(R_{p})_{p\in[m]}$ over schemas $S_{p}\subseteq[n],\forall p\in[m]$ , we can formulate this sum computation as a datalog-like query with aggregates [21]:

[TABLE]

The above notation means that the answer to query $Q^{(i,\ell)}_{1}$ is the sum of $x_{\ell}$ over all tuples $\bm{x}$ satisfying the conjunction on the right-hand side. Section 4 gives further queries necessary to compute the $k$ -means. As we show in this article, such queries with aggregates and inequalities can be computed asymptotically faster than the join defining $\bm{G}$ . ∎

Simple queries can already highlight the limitations of state-of-the-art evaluation techniques, as shown next.

Example 1.2.

State-of-the-art techniques take time $O(N^{2})$ to compute the following query over relations of size $\leq N$ :

[TABLE]

Examples 3.10 and 3.20 show how to compute $Q_{2}$ and its counting version in time $O(N^{1.5}\log N)$ using the techniques introduced in this article.∎

1.2 The FAQ-AI problem

One way to answer the above queries is to view them as functional aggregate queries (FAQ) [6] formulated in sum-product form over some semiring. We therefore briefly introduce FAQ over a single semiring.

We first establish notation. For any positive integer $n$ , let $\mathcal{V}=[n]$ . For $i\in\mathcal{V}$ , let $X_{i}$ denote a variable/attribute, and $x_{i}$ denote a value in the discrete domain $\text{\sf Dom}(X_{i})$ of $X_{i}$ . For any $K\subseteq\mathcal{V}$ , define $\bm{X}_{K}=(X_{i})_{i\in K}$ , $\bm{x}_{K}=(x_{i})_{i\in K}\in\prod_{i\in K}\text{\sf Dom}(X_{i})$ . That is, $\bm{X}_{K}$ is a tuple of variables and $\bm{x}_{K}$ is a tuple of values for these variables.

Consider a semiring $(\bm{D},\oplus,\otimes,\bm{0},\bm{1})$ . Let $\mathcal{H}=(\mathcal{V}=[n],\mathcal{E})$ be a multi-hypergraph, which means that $\mathcal{V}=[n]$ is a set of vertices and $\mathcal{E}$ is a multiset111A multiset is a collection of elements each of which can occur multiple times. of edges where each edge $K\in\mathcal{E}$ is a subset of $\mathcal{V}$ . To each edge $K\in\mathcal{E}$ we associate a function $R_{K}:\prod_{v\in K}\text{\sf Dom}(X_{i})\to\bm{D}$ called factor. An FAQ query over one semiring with free variables $F\subseteq\mathcal{V}$ has the form:

[TABLE]

Under the Boolean semiring $(\{\text{\sf true},\text{\sf false}\},\vee,\wedge,\text{\sf false},\text{\sf true})$ , the query (2) becomes a conjunctive query: The factors $R_{K}$ represent input relations, where $R_{K}(\bm{x}_{K})=\text{\sf true}$ iff $\bm{x}_{K}\in R_{K}$ , with some notational overloading. Under the sum-product semiring, the query (2) counts the number of tuples in the join result for each tuple $\bm{x}_{F}$ , where the factors $R_{k}$ are indicator functions $R_{K}(\bm{x}_{K})=\bm{1}_{x_{K}\in R_{K}}$ . (The notation $\bm{1}_{A}$ denotes the indicator function of the event $A$ in the semiring $(\bm{D},\oplus,\otimes,\bm{0},\bm{1})$ : $\bm{1}_{A}=\bm{1}$ if $A$ holds, and $\bm{0}$ otherwise.) To aggregate over some input variable, say $X_{k}$ , we can designate an identity factor $R_{k}(x_{k})=x_{k}$ .

Throughout the article, we assume the query size to be a constant and state runtimes in data complexity. It is known [6] that over an arbitrary semiring, the query (2) can be answered in time $O(N^{\text{\sf fhtw}(Q)}\cdot\log N)$ , where $N$ is the size of the largest relation $R_{K}$ , fhtw denotes the fractional hypertree width of the query, and $Q$ has no free variables [19]. If $Q$ has free variables, fhtw-width becomes FAQ-width instead [6]. Here $N$ is the size of the largest factor $R_{K}$ . Over the Boolean semiring, the time can be lowered to $\tilde{O}(N^{\text{\sf subw}(Q)})$ [9], where subw is the submodular width [32] and $\tilde{O}$ hides a polylogarithmic factor in $N$ .

Motivated by the examples in Section 1.1, we formulate a class of FAQ queries called FAQ-AI:

Definition 1.3 (FAQ-AI).

Given a hyperedge multiset $\mathcal{E}$ that is partitioned into two multisets $\mathcal{E}=\mathcal{E}_{s}\cup\mathcal{E}_{\ell}$ , where $s$ stands for “skeleton” and $\ell$ stands for “ligament”, the input to a query from the FAQ-AI class is the following:

To each hyperedge $K\in\mathcal{E}_{s}$ , there corresponds a function $R_{K}:\prod_{i\in K}\text{\sf Dom}(X_{i})\to\bm{D}$ , as in the FAQ case. 2. 2.

To each hyperedge $S\in\mathcal{E}_{\ell}$ , there corresponds $|S|$ functions $\theta^{S}_{v}:\text{\sf Dom}(X_{v})\to{\mathbb{R}}$ , one for every variable $v\in S$ .

The output to the FAQ-AI query is the following:

[TABLE]

The summation $\bigoplus$ is over tuples $\bm{x}_{\mathcal{V}\setminus F}\in\prod_{i\in\mathcal{V}\setminus F}\text{\sf Dom}(X_{i})$ . The (uni-variate) functions $\theta^{S}_{v}(x_{v})$ can be user-defined functions, e.g., $\theta_{1}^{S}(x_{1})=x_{1}^{2}/2$ , or binary predicates with one key in $\text{\sf Dom}(X_{v})$ and a numeric value, e.g., a table salary(employee_id, salary_value) where employee_id is a key. The only requirement we impose is that, given $x$ , the value $\theta^{S}_{v}(x)$ can be accessed/computed in $O(1)$ -time (in data complexity).

If $\mathcal{E}_{\ell}=\emptyset$ , then we get back the FAQ formulation (2).

Example 1.4.

The queries in Section 1.1 are instances of (3):

[TABLE]

Note that for a given $\bm{x}$ , $c_{ij}(\bm{x})$ can be computed in $O(1)$ -time in data complexity, which in this context means when the number of dimensions $n$ is a constant. $Q_{1}$ is over the sum-product semiring. $Q_{2}$ can be over any semiring: Example 3.10 discusses the case of the Boolean semiring while Example 3.20 discusses the sum-product semiring. ∎

1.3 Our contributions

To answer FAQ queries of the form (2), currently there are two dominant width parameters: fractional hypertree width (fhtw [19]) and submodular width (subw [32]).222Section 2.1 overviews other notions of widths. It is known that $\text{\sf subw}\leq\text{\sf fhtw}$ for any query, and in the Boolean semiring we can answer (2) in $\tilde{O}(N^{\text{\sf subw}})$ -time [9, 32]. For non-Boolean semirings, the best known algorithm, called InsideOut [6, 7], evaluates (2) in time $O(N^{\text{\sf fhtw}}\log N)$ . For queries with free variables, fhtw is replaced by the more general notion of FAQ-width (faqw) [6]; however, for brevity we discuss the non-free variable case here.

Following [7], both width parameters subw and fhtw can be defined via two constraint sets: the first is the set TD of all tree decompositions of the query hypergraph $\mathcal{H}$ , and the second is the set of polymatroids $\Gamma_{n}$ on $n$ vertices of $\mathcal{H}$ . The widths subw and fhtw are then defined as maximin and respectively minimax optimization problems on the domain pair TD and $\Gamma_{n}$ , subject to “edge domination” constraints for $\Gamma_{n}$ . Section 2 presents these notions and other related preliminary concepts in detail.

Our contributions include the following:

Answering FAQ-AI over Boolean semiring

On the Boolean semiring, one way to answer query (3) is to apply the PANDA algorithm [32], using edge domination constraints on $\mathcal{E}_{s}$ and the set TD of all tree decompositions of $\mathcal{H}=(\mathcal{V},\mathcal{E}=\mathcal{E}_{s}\cup\mathcal{E}_{\ell})$ . However, we can do better. In Section 3.2 we define a new notion of tree decomposition: relaxed tree decomposition, in which the hyperedges in $\mathcal{E}_{\ell}$ only have to be covered by adjacent TD bags. Then, we present a variant of the InsideOut algorithm running on these relaxed TDs using Chazelle’s classic geometric data structure [13] for solving the semigroup range search problem. We show that our InsideOut variant meets the “relaxed fhtw” runtime, which is the analog of fhtw on relaxed TD. The PANDA algorithm can use the InsideOut variant as a blackbox to meet the “relaxed subw” runtime. The relaxed widths are smaller than the non-relaxed counterparts, and are strictly smaller for some classes of queries, which means our algorithms yield asymptotic improvements over existing ones.

Answering FAQ over an arbitrary semiring

Next, to prepare the stage for answering FAQ-AI over an arbitrary semiring, in Section 3.3 we revisit FAQ over a non-Boolean semiring, where no known algorithm can achieve the subw-runtime. Here, we relax the set $\Gamma_{n}$ of polymatroids to a superset $\Gamma_{n}^{\prime}$ of relaxed polymatroids. Then, by adapting the subw definition to relaxed polymatroids, we obtain a new width parameter called “sharp submodular width” (#subw). We show how a variant of PANDA, called #PANDA, can achieve a runtime of $\tilde{O}(N^{\text{\sf\#subw}})$ for evaluating FAQ over an arbitrary semiring. We prove that $\text{\sf subw}\leq\text{\sf\#subw}\leq\text{\sf fhtw}$ , and that there are classes of queries for which #subw is unboundedly smaller than fhtw.

Answering FAQ-AI over an arbitrary semiring

Getting back to FAQ-AI, we apply the #subw result under both relaxations: relaxed TD and relaxed polymatroids, to obtain a new width parameter called the relaxed #subw. We show that the new variants of PANDA and InsideOut can achieve the relaxed #subw runtime. We also show that there are queries for which relaxed #subw is essentially the best we can hope for, modulo $k$ -sum-hardness.

Applications to relational Machine Learning

Equipped with the algorithms for answering FAQ-AI, in Section 4 we return to relational machine learning applications over training datasets defined by feature extraction queries over relational databases. We show how one can train linear SVM, $k$ -means, and ML models using Huber/hinge loss functions without completely materializing the output of the feature extraction queries. In particular, this shows that for these important classes of ML models, one can sometimes train models in time sub-linear in the size of the training dataset.

An early version of this work appeared in the proceedings of the 38th ACM Symposium on Principles of Database Systems (PODS’19) [1]. This article goes beyond that early version by extending the class of loss functions supported by our framework for relational machine learning, introducing new applications for our framework on the (probabilistic) database side, and including detailed proofs and derivation steps for various key results.

1.4 Related work

Appendix A revisits two prior results on the evaluation of queries with inequalities through FAQ-AI lenses: Core XPath queries over XML documents [18] and inequality joins over tuple-independent probabilistic databases [36].

Throughout the article, we contrast our new width notions with fhtw and subw and our new algorithm #PANDA with the state-of-the-art algorithms PANDA and InsideOut for FAQ and FAQ-AI queries.

Prior seminal work considers the containment and minimization problem for queries with inequalities [27]. The efficient evaluation of such queries continues to receive good attention in the database community [26]. There is a bulk of work on queries with disequalities (not-equal), which are at times referred to as inequalities. Queries with disequalities are a proper subclass of FAQ-AI (since $x\neq y$ can be represented as $x<y\vee x>y$ ). Prior works [28, 4] present several results for this proper subclass that are stronger than our general results for FAQ-AI in this work. In particular, for queries with disequalities it suffices to consider tree decompositions only for “skeleton” edges (ignoring “ligament” edges which -in this case- are the disequalities) [28, 4], whereas for the more general FAQ-AI we need to consider “relaxed” tree decompositions (see Def. 3.3).

Section 4 reviews relevant works on machine learning.

2 Preliminaries

We assume without loss of generality that semiring operations $\oplus$ and $\otimes$ can be performed in $O(1)$ -time. (When the assumption does not hold, for the set semiring for instance, we can multiply the claimed runtime with the real operation’s runtime.)

2.1 Tree decompositions and polymatroids

We briefly define tree decompositions, fhtw and subw parameters. We refer the reader to the recent survey by Gottlob et al. [17] for more details and historical contexts. In what follows, the hypergraph $\mathcal{H}$ should be thought of as the hypergraph of the input query, although the notions of tree decomposition and width parameters are defined independently of queries.

A tree decomposition of a hypergraph $\mathcal{H}=(\mathcal{V},\mathcal{E})$ is a pair $(T,\chi)$ , where $T$ is a tree whose nodes are $V(T)$ and $\chi:V(T)\to 2^{\mathcal{V}}$ maps each node $t$ of the tree to a subset $\chi(t)$ of vertices such that

every hyperedge $S\in\mathcal{E}$ is a subset of some $\chi(t)$ , $t\in V(T)$ (i.e. every edge is covered by some bag), 2. 2.

for every vertex $v\in\mathcal{V}$ , the set $\{t\ |\ v\in\chi(t)\}$ is a non-empty (connected) sub-tree of $T$ . This is called the running intersection property.

The sets $\chi(t)$ are called the bags of the tree decomposition.

Let $\text{\sf TD}(\mathcal{H})$ denote the set of all tree decompositions of $\mathcal{H}$ . When $\mathcal{H}$ is clear from context, we use TD for brevity.

To define width parameters, we use the polymatroid characterization from Abo Khamis et al. [9]. A function $f:2^{\mathcal{V}}\to{\mathbb{R}}_{+}$ is called a (non-negative) set function on $\mathcal{V}$ . A set function $f$ on $\mathcal{V}$ is modular if $f(S)=\sum_{v\in S}f(\{v\})$ for all $S\subseteq\mathcal{V}$ , monotone if $f(X)\leq f(Y)$ whenever $X\subseteq Y$ , and submodular if $f(X\cup Y)+f(X\cap Y)\leq f(X)+f(Y)$ for all $X,Y\subseteq\mathcal{V}$ . A monotone, submodular set function $h:2^{\mathcal{V}}\to{\mathbb{R}}_{+}$ with $h(\emptyset)=0$ is called a polymatroid. Let $\Gamma_{n}$ denote the set of all polymatroids on $\mathcal{V}=[n]$ .

Given $\mathcal{H}$ , define the set of edge dominated set functions:

[TABLE]

We next define the submodular width and fractional hypertree width of a given hypergraph $\mathcal{H}$ :

[TABLE]

It is known [32] that $\text{\sf subw}(\mathcal{H})\leq\text{\sf fhtw}(\mathcal{H})$ , and there are classes of hypergraphs with bounded subw and unbounded fhtw. Furthermore, fhtw is strictly less than other width notions such as (generalized) hypertree width and tree width.

Remark 2.1.

Prior to Abo Khamis et al. [9], the commonly used definition of $\text{\sf fhtw}(\mathcal{H})$ is [19]

[TABLE]

where $\rho^{*}_{\mathcal{E}}(B)$ is the fractional edge cover number of a vertex set $B$ using the hyperedge set $\mathcal{E}$ . It is straightforward to show, using linear programming duality [9], that

[TABLE]

proving the equivalence of the two definitions. However, the characterization (6) has two primary advantages: (i) it exposes the minimax / maximin duality between fhtw and subw, and more importantly (ii) it makes it completely straightforward to relax the definitions by replacing the $\text{\sf ED}\cap\Gamma_{n}$ constraints by other applicable constraints, as shall be shown in later sections.∎

Definition 2.2 ( $F$ -connex tree

decomposition [11, 39]).

Given a hypergraph $\mathcal{H}=(\mathcal{V},\mathcal{E})$ and a set $F\subseteq\mathcal{V}$ , a tree decomposition $(T,\chi)$ of $\mathcal{H}$ is $F$ -connex if there is a subset $V^{\prime}\subseteq V(T)$ that forms a connected subtree of $T$ and satisfies $\bigcup_{t\in V^{\prime}}\chi(t)=F$ . (Note that $V^{\prime}$ could be empty.)

We use $\text{\sf TD}_{F}$ to denote the set of all $F$ -connex tree decompositions of $\mathcal{H}$ . (Note that when $F=\emptyset$ , $\text{\sf TD}_{F}=\text{\sf TD}$ .)

Definition 2.3 (Non-redundant tree decomposition).

A tree decomposition $(T,\chi)$ is redundant if there are $t_{1}\neq t_{2}\in V(T)$ where $\chi(t_{1})\subseteq\chi(t_{2})$ . A tree decomposition is non-redundant if it is not redundant.

The following proposition is folklore. For completeness, we prove it in Appendix B.

Proposition 2.4.

For every tree decomposition $(T,\chi)$ of a query $Q$ , there exists a non-redundant tree decomposition $(T^{\prime},\chi^{\prime})$ of $Q$ that satisfies

[TABLE]

Moreover, if $(T,\chi)$ is $F$ -connex, then $(T^{\prime},\chi^{\prime})$ can be chosen to be $F$ -connex as well.

Based on the above proposition, we only need to consider non-redundant tree decompositions $(T,\chi)$ in (6) and (7) (and later on in (10) and (13)).

2.2 InsideOut and PANDA

To answer the FAQ query (2), we need a model for the representation of the input factors $R_{K}$ . The support of the function $R_{K}$ is the set of tuples $\bm{x}_{K}$ such that $R(\bm{x}_{K})\neq\bm{0}$ . We use $|R_{K}|$ to denote the size of its support. For example, if $R_{K}$ represents an input relation, then $|R_{K}|$ is the number of tuples in $R_{K}$ . In practice, there often are factors with infinite support, e.g., $R_{K}$ represents a built-in function in a database, an arithmetic operator, or a comparison operator as in (3). To deal with this more general setting, the edge set $\mathcal{E}$ is partitioned into two sets $\mathcal{E}=\mathcal{E}_{\not\infty}\cup\mathcal{E}_{\infty}$ , where $|R_{K}|$ is finite for all $K\in\mathcal{E}_{\not\infty}$ and $|R_{K}|=\infty$ for all $K\in\mathcal{E}_{\infty}$ . For simplicity, we often state runtimes of algorithms in terms of the “input size” $N:=\max_{K\in\mathcal{E}_{\not\infty}}|R_{K}|$ . Moreover, we use $|Q|$ to denote the output size of $Q$ . We always assume that $\bigcup_{S\in\mathcal{E}_{\not\infty}}S=\mathcal{V}$ ; otherwise the output size $|Q|$ could be infinite.

InsideOut [6, 5, 7]

To answer (2), the InsideOut algorithm works by eliminating variables, along with an idea called the “indicator projection” (see Appendix C for more details). The runtime is described by the FAQ*-width* of the query, a slight generalization of fhtw. For one semiring, we can define $\text{\sf faqw}(Q)$ by applying Definition (6) over a restricted set of tree decompositions and edge dominated polymatroids. In particular, let $F\subseteq\mathcal{V}$ denote the set of free variables in (2), and recall $\text{\sf TD}_{F}$ from Definition 2.2. Then,

[TABLE]

Note that $\text{\sf faqw}(Q)=\text{\sf fhtw}(\mathcal{H})$ when $F=\emptyset$ and $\mathcal{E}_{\infty}=\emptyset$ (i.e. $\mathcal{E}=\mathcal{E}_{\not\infty}$ ). A simple result from Abo Khamis et al. [6] is the following: (Recall that throughout the article we assume the query size to be a constant and state runtimes in data complexity.)

Theorem 2.5 ([6, 5]).

InsideOut* answers query (2) in time $O(N^{\text{\sf faqw}(Q)}\cdot\log N+|Q|)$ .*

A proof sketch of the above theorem can be found in Appendix C. To solve the FAQ-AI (3), we can apply Theorem 2.5 with $\mathcal{E}_{\infty}\supseteq\mathcal{E}_{\ell}$ since all ligament factors are infinite. But this is suboptimal—later, we show a new InsideOut variant that is polynomially better.

PANDA [9, 8]

For the Boolean semiring, i.e., when the FAQ query (2) is of the form

[TABLE]

we can do much better than Theorem 2.5. When $F=\emptyset$ , Marx [32] showed that (12) can be answered in time $\tilde{O}(N^{O(\text{\sf subw}(Q))})$ . The PANDA algorithm [9, 8] generalizes Marx’s result to deal with general degree constraints, and to meet precisely the $\tilde{O}(N^{\text{\sf subw}(Q)})$ -runtime (see Appendix D for more details). In fact, PANDA works with queries such as (12) with free variables as well. In the context of this article, we can define the following notion of submodular FAQ-width in a natural way:

[TABLE]

Then, the results from Abo Khamis et al. [9] imply:

Theorem 2.6 ([9, 8]).

PANDA* answers query (12) in time $\tilde{O}(N^{\text{\sf smfw}(Q)}+|Q|)$ .*

Appendix D presents an overview of the core PANDA algorithm and its analysis. The PANDA results only work for the Boolean semiring. Section 3 introduces a variant of PANDA, called #PANDA, that also works for non-Boolean semirings.

2.3 Semigroup range searching

Orthogonal range counting (and searching) is a classic and ubiquitous problem in computational geometry [15]: given a set $S$ of $N$ points in a $d$ -dimensional space, build a data structure that, given any $d$ -dimensional rectangle, can efficiently return the number of enclosed points. More generally, there is the semigroup range searching problem [13], where each point $\bm{p}\in S$ of the $N$ input points also has a weight $w(\bm{p})\in G$ , where $(G,\oplus)$ is a semigroup.333In a semigroup we can add two elements using $\oplus$ , but there is no additive inverse. The problem is: given a $d$ -dimensional rectangle $R$ , compute $\bigoplus_{\bm{p}\in S\cap R}w(\bm{p})$ .

Classic results by Chazelle [13] show that there are data structures for semigroup range searching which can be constructed in time $O(N\log^{d-1}N)$ , and answer rectangular queries in $O(\log^{d-1}N)$ -time. Also, this is almost the best we can hope for [14]. There are more recent improvements to Chazelle’s result (see, e.g., Chan et al. [12]), but they are minor (at most a $\log$ factor), as the original results were already very close to matching the lower bound.

Most of these range search/counting problems can be reduced to the dominance range searching problem (on semigroups), where the query is represented by a point $\bm{q}$ , and the objective is to return $\bigoplus_{\bm{q}\preceq\bm{p},\bm{p}\in S}w(\bm{p})$ . Here, $\preceq$ denotes the “dominance” relation (coordinate-wise $\leq$ ). We can think of $\bm{q}$ as the lower-corner of an infinite rectangle query.

3 Relaxed tree decompositions and relaxed polymatroids

3.1 Connection to semigroup range searching

We always assume that $\bigcup_{S\in\mathcal{E}_{s}}S=\mathcal{V}$ ; otherwise the output size $|Q|$ could be infinite. We start with a special case of (3) in which the skeleton part $\mathcal{E}_{s}$ contains only two hyperedges $U$ and $L$ . Consider the aggregate query of the form

[TABLE]

where $\Phi_{1}$ and $\Phi_{2}$ are two input functions/relations over variable sets $U$ and $L$ , respectively. We prove the following simple but important lemma:

Lemma 3.1.

Let $N=\max\{|\Phi_{1}|,|\Phi_{2}|\}$ , and $k=|\mathcal{E}_{\ell}|$ . For $F\subseteq U$ , query (14) can be answered in time $O(N\cdot(\log N)^{\max(k-1,1)})$ .

Proof.

If there is a hyperedge $S\in\mathcal{E}_{\ell}$ for which $S\subseteq U$ , then in a $O(N\log N)$ -time pre-processing step we can “absorb” the factor $\bm{1}_{\sum_{v\in S}\theta^{S}_{v}(x_{v})\leq 0}$ into the factor $\Phi_{1}$ , by replacing $\Phi_{1}(\bm{x}_{U})$ with the product $\Phi_{1}(\bm{x}_{U})\otimes\bm{1}_{\sum_{v\in S}\theta^{S}_{v}(x_{v})\leq 0}$ . In particular, this product can be computed by iterating over tuples $\bm{x}_{U}$ satisfying $\Phi_{1}(\bm{x}_{U})\neq\bm{0}$ and for each such tuple $\bm{x}_{U}$ , testing whether the inequality $\sum_{v\in S}\theta^{S}_{v}(x_{v})\leq 0$ holds. If it does, then the indicator $\bm{1}_{\sum_{v\in S}\theta^{S}_{v}(x_{v})\leq 0}$ takes a value of $\bm{1}$ , hence the value of $\Phi_{1}(\bm{x}_{U})$ remains unchanged after the product. Otherwise, both the indicator $\bm{1}_{\sum_{v\in S}\theta^{S}_{v}(x_{v})\leq 0}$ and its product with $\Phi_{1}(\bm{x}_{U})$ take a value of $\bm{0}$ . A similar absorption can be done with $S\subseteq L$ . Hence, without loss of generality we can assume that $S\not\subseteq L$ and $S\not\subseteq U$ for all $S\in\mathcal{E}_{\ell}$ .

Moreover, we only need to show that we can compute (14) for $F=U$ , because after $Q(\bm{x}_{U})$ is computed, we can “aggregate away” variables $\bm{x}_{U\setminus F}$ in $O(N\log N)$ -time by computing the aggregation:

[TABLE]

The above aggregation can be computed by sorting tuples $\bm{x}_{U}$ satisfying $Q(\bm{x}_{U})\neq\bm{0}$ lexicographically based on $(\bm{x}_{F},\bm{x}_{U\setminus F})$ so that tuples $\bm{x}_{U}$ sharing the same $\bm{x}_{F}$ -prefix become consecutive. Then for each distinct $\bm{x}_{F}$ -prefix, we aggregate away $Q(\bm{x}_{U})$ over all tuples $\bm{x}_{U}$ sharing that prefix.

Abusing notation somewhat, for each $S\in\mathcal{E}_{\ell}$ and each $T\subseteq S$ , define the function $\theta^{S}_{T}:\prod_{v\in T}\text{\sf Dom}(X_{v})\to{\mathbb{R}}$ by

[TABLE]

Fix a tuple $\bm{x}_{U}$ such that $\Phi_{1}(\bm{x}_{U})\neq\bm{0}$ . A tuple $\bm{x}_{L}$ is said to be $\bm{x}_{U}$ -adjacent if $\pi_{U\cap L}\bm{x}_{U}=\pi_{U\cap L}\bm{x}_{L}$ . We show how to compute the following sum in poly-logarithmic time:

[TABLE]

where the inner sum ranges only over tuples $\bm{x}_{L}$ which are $\bm{x}_{U}$ -adjacent. This is because the value of $\bm{x}_{U\cap L}$ has been fixed and tuples $\bm{x}_{L}$ that are not $\bm{x}_{U}$ -adjacent are inconsistent with the fixed value of $\bm{x}_{U\cap L}$ .

Now, for the fixed $\bm{x}_{U}$ and for each $\bm{x}_{L}$ define the following $k$ -dimensional points:

[TABLE]

We write $\bm{q}(\bm{x}_{U})\preceq\bm{p}(\bm{x}_{L})$ to say that $\bm{q}(\bm{x}_{U})$ is dominated by $\bm{p}(\bm{x}_{L})$ coordinate-wise: $q_{S}(\bm{x}_{U})\leq p_{S}(\bm{x}_{L})\;\forall\;S\in\mathcal{E}_{\ell}$ . Assign to each point $\bm{p}(\bm{x}_{L})$ a “weight” of $\Phi_{2}(\bm{x}_{L})$ . Now, taking (17),

[TABLE]

(The equality $\bigotimes_{S\in\mathcal{E}_{\ell}}\bm{1}_{q_{S}(\bm{x}_{U})\leq p_{S}(\bm{x}_{L})}=\bm{1}_{\bm{q}(\bm{x}_{U})\preceq\bm{p}(\bm{x}_{L})}$ used above follows from the definition of the component-wise $\preceq$ .) The expression thus computes, for a given “query point” $\bm{q}(\bm{x}_{U})$ , the weighted sum over all points $\bm{p}(\bm{x}_{L})$ that dominate the query point. This is precisely the dominance range counting problem, which—modulo a $O(N(\log N)^{\max(k-1,1)})$ -preprocessing step—can be solved in time $O((\log N)^{\max(k-1,1)})$ [13], as reviewed in Section 2.3.

∎

Example 3.2.

Let $R$ be a binary relation. Suppose we want to count the number of tuples satisfying $R(a,b)\wedge R(b,c)\wedge a<c$ . By setting $F=\emptyset$ , $U=\{a,b\}$ , $L=\{b,c\}$ , the problem can be reduced to the form (14) with $k=1$ , $\mathcal{E}_{\ell}=\{\{a,c\}\}$ . We can thus compute this count in time $O(N\log N)$ .∎

3.2 Relaxed tree decompositions

Equipped with this basic case, we can now proceed to solve the general setting of (3). To this end, we define a new width parameter.

Definition 3.3 (Relaxed tree decomposition).

Let $\mathcal{H}=(\mathcal{V},$ $\mathcal{E}=\mathcal{E}_{s}\cup\mathcal{E}_{\ell})$ denote a multi-hypergraph whose edge multiset is partitioned into $\mathcal{E}_{s}$ and $\mathcal{E}_{\ell}$ . A relaxed tree decomposition of $\mathcal{H}$ (with respect to the partition $\mathcal{E}_{s}\cup\mathcal{E}_{\ell}$ ) is a pair $(T,\chi)$ , where $T=(V(T),E(T))$ is a tree whose nodes and edges are $V(T)$ and $E(T)$ respectively, and $\chi:V(T)\to 2^{\mathcal{V}}$ satisfies the following properties:

(a)

The running intersection property holds: for each node $v\in\mathcal{V}$ the set $\{t\in V(T)\ |\ v\in\chi(t)\}$ is a connected subtree in $T$ .

(b)

Every “skeleton” edge $S\in\mathcal{E}_{s}$ is covered by some bag $\chi(t)$ , $t\in V(T)$ .

(c)

Every “ligament” edge $S\in\mathcal{E}_{\ell}$ is covered by the union of two adjacent bags $s$ and $t$ , i.e. $S\subseteq\chi(s)\cup\chi(t)$ , where $\{s,t\}\in E(T)$ .

Let $\text{\sf TD}^{\ell}(\mathcal{H})$ denote the set of all relaxed tree decompositions of $\mathcal{H}$ (with respect to the skeleton-ligament partition). When $\mathcal{H}$ is clear from context we use $\text{\sf TD}^{\ell}$ for the sake of brevity. Given $F\subseteq\mathcal{V}$ , let $\text{\sf TD}^{\ell}_{F}$ denote the set of all relaxed $F$ -connex tree decompositions of $\mathcal{H}$ .

The new condition (c) in the above definition is needed so that later we can utilize Lemma 3.1 to compute aggregate queries over the relaxed tree decomposition. In particular, the two adjacent bags $s$ and $t$ in condition (c) will play the role of $U$ and $L$ from Lemma 3.1 and the corresponding query (14).

3.2.1 FAQ-AI on a general semiring

We use relaxed TDs in conjunction with Lemma 3.1 to answer FAQ-AI with a relaxed notion of faqw. In particular, the relaxed width parameters of $\mathcal{H}$ are defined in exactly the same way as the usual width parameters defined in Section 2, except we allow the TDs to range over relaxed ones.

Definition 3.4 (Relaxed faqw).

Let $Q$ be an FAQ-AI query (3), and $\mathcal{H}=(\mathcal{V},\mathcal{E}=\mathcal{E}_{s}\cup\mathcal{E}_{\ell})$ be its hypergraph. Furthermore, let $\mathcal{E}_{\not\infty}\subseteq\mathcal{E}_{s}$ denote the set of hyperedges $K\in\mathcal{E}$ for which $|R_{K}|<\infty$ . Then, the relaxed FAQ-width of $Q$ is defined by

[TABLE]

When $F=\emptyset$ , $\text{\sf faqw}_{\ell}$ collapses to $\text{\sf fhtw}_{\ell}$ which is the relaxed fhtw for FAQ-AI $Q$ without free variables:

[TABLE]

A relaxed tree decomposition $(T,\chi)$ of $Q$ is optimal if its width is equal to $\text{\sf faqw}_{\ell}$ , i.e.,

[TABLE]

Theorem 3.5.

Any FAQ-AI query $Q$ of the form (3) on any semiring can be answered in time $O(N^{\text{\sf faqw}_{\ell}(Q)}\cdot(\log N)^{\max(k-1,1)}+|Q|)$ , where $k$ is the maximum number of additive inequalities covered by a pair of adjacent bags in an optimal relaxed tree decomposition.444Note that $k$ can be a lot smaller than $|\mathcal{E}_{\ell}|$ since different additive inequalities can be covered by different pairs of adjacent bags in an optimal relaxed hypertree decomposition.

Proof.

We first consider the case of no free variables because this case captures the key idea. Fix an optimal relaxed tree decomposition $(T,\chi)$ . We first compute, for each bag $t\in V(T)$ of the tree decomposition, a factor $\Phi_{t}:\prod_{i\in\chi(t)}\text{\sf Dom}(X_{i})\to\bm{D}$ such that

[TABLE]

To define the factors $\Phi_{t}$ , we need the notion of indicator projection [7, 5, 6]; see Appendix C for some background about the InsideOut algorithm where this notion was originally developed.

Definition 3.6 (Indicator Projection [6, 5, 7]).

For a given $K\in\mathcal{E}_{\not\infty}$ and $I\subseteq\mathcal{V}$ such that $J:=K\cap I\neq\emptyset$ , the indicator projection of $R_{K}$ onto the set $I$ is a function $\pi_{K,I}:\prod_{v\in J}\text{\sf Dom}(X_{v})\to\{\bm{0},\bm{1}\}$ defined by

[TABLE]

Based on the above definition, it is easy to verify that for any $K\in\mathcal{E}_{\not\infty}$ and $I\subseteq\mathcal{V}$ such that $J:=K\cap I\neq\emptyset$ , we have the identity

[TABLE]

Recall from Definition 3.3 that every $K\in\mathcal{E}_{s}$ is covered by at least one bag $\chi(t)$ for $t\in V(T)$ . Fix an arbitrary coverage assignment $\alpha:\mathcal{E}_{s}\to V(T)$ , where $K$ is covered by the bag $\chi(\alpha(K))$ . Then, the factors $\Phi_{t}$ are defined by:

[TABLE]

Claim 1.

The factors $\Phi_{t}$ defined by (26) satisfy (23).

The above claim can be proved as follows:

[TABLE]

For every $t\in V(T)$ , the query $\Phi_{t}$ can be reduced to a join query and solved using a worst-case optimal join algorithm [34, 35, 43] as follows. For every $K\in\mathcal{E}_{\not\infty}$ where $K\cap\chi(t)\neq\emptyset$ , define $\overline{\pi}_{K,\chi(t)}$ to be the support of the factor $\pi_{K,\chi(t)}$ , which is the set of tuples $\bm{x}_{K\cap\chi(t)}$ satisfying $\pi_{K,\chi(t)}(\bm{x}_{K\cap\chi(t)})\neq\bm{0}$ :

[TABLE]

$\overline{\pi}_{K,\chi(t)}$ can be viewed as a relation over variables $\bm{x}_{K\cap\chi(t)}$ . Computing $\Phi_{t}$ can be reduced to solving the join query $\overline{\Phi}_{t}$ defined as:

[TABLE]

This is because once we solve the join query $\overline{\Phi}_{t}$ , the factor $\Phi_{t}$ can be computed as follows:

[TABLE]

where $\overline{\Phi}_{t}$ above denotes the output of the join query $\overline{\Phi}_{t}$ . The join query $\overline{\Phi}_{t}$ can be computed using a worst-case optimal join algorithm in time

[TABLE]

Over all $t\in V(T)$ , our runtime is bounded by $O(N^{w}\log N)$ , where

[TABLE]

Moreover for every $t\in V(T)$ , the output size of the join query $\overline{\Phi}_{t}$ is bounded by $N^{\rho^{*}_{\mathcal{E}_{\not\infty}}(\chi(t))}\leq N^{w}$ , thanks to the AGM bound [10, 19].

Next we compute (23) in time $O(N^{w}\log N)$ . We will make use of the fact that $(T,\chi)$ is a relaxed TD. Fix an arbitrary root of the tree decomposition $(T,\chi)$ ; following InsideOut (Appendix C), we compute (23) by eliminating variables from the leaves of $(T,\chi)$ up to the root. Thanks to Proposition 2.4, we can assume the tree decomposition to be non-redundant. Let $t_{1}$ be any leaf of $(T,\chi)$ , $t_{2}$ be its parent, where $L=\chi(t_{1})$ and $U=\chi(t_{2})$ . Because of non-redundancy, we have $L\setminus U\neq\emptyset$ . Now write (23) as follows:

[TABLE]

The third equality uses the semiring’s distributive law. (Note that $S\cap(L\setminus U)\neq\emptyset$ implies that $S\subseteq(L\cup U)$ thanks to Definition 3.3 and the fact that $t_{2}$ is the only neighbor of $t_{1}$ .) Lemma 3.1 implies that we can compute the sub-query $\varphi_{U}(\bm{x}_{U})$ from (32) in the allotted time. The above step eliminates all variables in $L\setminus U$ . In particular after this step, the original query $Q()$ from (23) becomes:

[TABLE]

The above is an FAQ of the same form (3) except that it no longer involves the variables $\bm{x}_{L\setminus U}$ (Recall that $L\setminus U\neq\emptyset$ ). It admits a tree decomposition that results from the original tree decomposition $(T,\chi)$ by removing the leaf $t_{1}$ . In particular, the new factor $\varphi_{U}(\bm{x}_{U})$ in (33) is covered by the bag $\chi(t_{2})$ and all other properties of tree decompositions continue to hold after the removal of $t_{1}$ . By induction on the number of variables, we solve the new query (33) in time $O(N^{w}\log N)$ . Induction completes the proof. (In the base case, we have a query with no variables where the theorem holds trivially.)

When the query has free variables, the algorithm proceeds similarly to the case of an FAQ with free variables [6, 5]. See Appendix C for a recap of how to handle free variables in an FAQ. ∎

Example 3.7.

Given three binary relations $R,S$ and $T$ , consider a query $Q$ that counts the number of tuples $(a,b,c,d)$ that satisfy:

[TABLE]

The query $Q$ has $\mathcal{E}_{s}=\mathcal{E}_{\not\infty}=\{\{a,b\},\{b,c\},\{c,d\}\}$ and $\mathcal{E}_{\ell}=\mathcal{E}_{\infty}=$ $\{\{a,c\},\{b,c\},$ $\{b,d\}\}$ . Let $N=\max\{|R|,|S|,|T|\}$ . Note that $\text{\sf faqw}(Q)=2$ . In fact, any of the previously known algorithms, e.g. [6, 7], would take time $O(N^{2})$ to answer $Q$ . However, this query has $\text{\sf faqw}_{\ell}(Q)=1$ , and by Theorem 4, it can be answered in time $O(N\cdot\log N)$ . (Note that here $2=k<|\mathcal{E}_{\ell}|=3$ .) An optimal relaxed tree decomposition is shown in Figure 1.∎

We next give a couple of simple lower and upper bounds for $\text{\sf faqw}_{\ell}$ . The upper bound shows that, effectively $\text{\sf faqw}_{\ell}$ is the best we can hope for, if the FAQ-AI query is arbitrary. The lower bound shows that, while the relaxed tree decomposition idea can improve the runtime by a polynomial factor, it cannot improve the runtime over straightforwardly applying InsideOut (over non-relaxed tree decompositions) by more than a polynomial factor.

Proposition 3.8.

For any positive integer $m$ , there exists an FAQ-AI query of the form (3) for which $F=\emptyset$ , $\text{\sf faqw}_{\ell}(Q)\geq m$ and it cannot be answered in time $o(N^{\text{\sf faqw}_{\ell}(Q)})$ , modulo $k$ -sum hardness.

Proof.

It is widely assumed [37, 30] that $O(N^{\lceil k/2\rceil})$ is the best runtime for $k$ -sum, which is the following problem: given $k$ number sets $R_{1},\dots,R_{k}$ of maximum size $N$ , determine whether there is a tuple $\bm{t}\in R_{1}\times\cdots\times R_{k}$ such that $\sum_{i\in[k]}t_{i}=0$ . We can reduce $k$ -sum to our problem: Consider the query $Q$ over the Boolean semiring:

[TABLE]

The answer to $Q$ is true iff there is a tuple $(x_{1},\dots,x_{k})\in R_{1}\times\cdots\times R_{k}$ such that $\sum_{i\in[k]}x_{i}=0$ . The reduction shows that our query (35) is $k$ -sum-hard. For this query, $\text{\sf faqw}_{\ell}(Q)=\lceil k/2\rceil$ .

∎

Proposition 3.9.

For any FAQ-AI query $Q$ of the form (3), we have $\text{\sf faqw}_{\ell}(Q)\geq\frac{1}{2}\text{\sf faqw}(Q)$ ; in particular, when $Q$ has no free variables $\text{\sf fhtw}_{\ell}(Q)\geq\frac{1}{2}\text{\sf fhtw}(Q)$ .

Proof.

Let $(T,\chi)$ denote a relaxed tree decomposition of $Q$ with fractional hypertree width $\text{\sf faqw}_{\ell}(Q)$ . Construct a new (non-relaxed) tree decomposition $(T^{\prime},\chi^{\prime})$ for $Q$ as follows. Each vertex $t$ in $V(T)$ is also a vertex in $V(T^{\prime})$ with $\chi^{\prime}(t)=\chi(t)$ . Moreover, to each edge $\{s,t\}\in E(T)$ there corresponds an additional vertex $st$ in $V(T^{\prime})$ whose bag is $\chi^{\prime}(st)=\chi(s)\cup\chi(t)$ . As for the edge set of $T^{\prime}$ , for each edge $\{s,t\}\in E(T)$ , there are two corresponding edges in $E(T^{\prime})$ , namely $\{s,st\}$ and $\{t,st\}$ . We can verify that $(T^{\prime},\chi^{\prime})$ is a (non-relaxed) tree decomposition of $Q$ . Moreover because each bag of $(T^{\prime},\chi^{\prime})$ is covered by at most two bags of $(T,\chi)$ , the FAQ-width of $(T^{\prime},\chi^{\prime})$ is at most $2\cdot\text{\sf faqw}_{\ell}(Q)$ . Finally, if $(T,\chi)$ is $F$ -connex, then so is $(T^{\prime},\chi^{\prime})$ . ∎

3.2.2 FAQ-AI on the Boolean semiring

Before explaining how we can adapt PANDA to solve an FAQ-AI query on the Boolean semiring, we give the intuition with an example.

Example 3.10.

Consider the following FAQ-AI:

[TABLE]

Here $\text{\sf faqw}_{\ell}(Q)=\text{\sf faqw}(Q)=2$ . Using fractional hypertree width measure and InsideOut (even with relaxed TDs and Theorem 4), the best runtime is $O(N^{2})$ , because no matter which (relaxed) TD we choose, the worst-case bag relation size is $\Theta(N^{2})$ . However the PANDA framework [9, 8] can solve many queries, including this one, in time smaller than the FAQ-width. At a very high level, the way PANDA achieves this is by carefully partitioning the input data and then choosing a possibly different tree decomposition for each part. Query (36) accepts two non-redundant and non-trivial555A tree decomposition is trivial if it consists of only one bag containing all the variables. relaxed tree decompositions. The first tree decomposition consists of the bags $\{a,b,c\}$ and $\{c,d\}$ while the second has the bags $\{a,b\}$ and $\{b,c,d\}$ . The PANDA framework utilizes both tree decompositions simultaneously to solve this query. In particular, for each tuple $(a,b,c,d)$ satisfying the body of query (36), we make sure that this tuple is “captured by” at least one of the two tree decompositions in the sense that it will reported by a query over this tree decomposition. We realize this intuition using the following disjunctive Datalog rule:

[TABLE]

In the above rule, there are two relations in the head $U$ and $W$ , and they form a solution to the rule iff the following holds: if $(a,b,c,d)$ satisfies the body, then either $(a,b,c)\in U$ or $(b,c,d)\in W$ . Via information-theoretic inequalities [9, 8], we are able to show that PANDA can compute a solution $(U,W)$ to the above disjunctive Datalog rule in time $\tilde{O}(N^{1.5})$ . In particular, both $|U|$ and $|W|$ are bounded by $N^{1.5}$ .

Given such a solution $(U,W)$ to (37) (which is not necessarily unique), it is straightforward to verify that the following also holds, using the distributivity of $\vee$ over $\wedge$ :

[TABLE]

By semijoin-reducing $W$ against $S$ and $T$ (i.e. by replacing $W(b,c,d)$ with $(W(b,c,d)\Join S(b,c))\Join T(c,d)$ ), and similarly by semjoin-reducing $U$ against $R$ and $S$ , we conclude that

[TABLE]

Finally, we have a rewrite of the original body:

[TABLE]

The above captures precisely our intuition that every tuple $(a,b,c,d)$ satisfying the body of (36) should be reported by either one of the two relaxed tree decompositions. By defining intermediate rules, we can compute $Q$ from them:

[TABLE]

$Q_{1}$ and $Q_{2}$ are of the form (14), and thus they each can be answered in $\tilde{O}(N^{1.5})$ -time (since $|U|,|W|\leq N^{1.5}$ ). This implies that $Q$ can be answered in $\tilde{O}(N^{1.5})$ -time overall.∎

The strategy outlined in the above example uses PANDA to evaluate an FAQ-AI query over the Boolean semiring. The resulting algorithm achieves a natural generalization of the submodular FAQ-width defined in (13):

Definition 3.11.

Given an FAQ-AI query $Q$ (3) over the Boolean semiring. The relaxed submodular FAQ-width of $Q$ is defined by

[TABLE]

(Recall that the set of relaxed tree decompositions $\text{\sf TD}_{F}^{\ell}$ was defined in Definition 3.3.)

Theorem 3.12.

Any FAQ-AI query $Q$ of the form (3) on the Boolean semiring can be answered in time $\tilde{O}(N^{\text{\sf smfw}_{\ell}(Q)}+|Q|)$ .

Proof.

As in the proof of Theorem 4, we first assume there are no free variables; the generalization to $F\neq\emptyset$ is a straightforward generalization of techniques developed in [6, 5] and reviewed in Appendix C. When $F=\emptyset$ , the query (3) is written in Datalog as:

[TABLE]

We write $R_{K}$ instead of $R_{K}(\bm{x}_{K})$ and $\theta^{S}_{v}$ instead of $\theta^{S}_{v}(x_{v})$ to avoid clutter. It will be implicit throughout this proof that the subscript of a factor/function indicates its arguments. To answer query (44), the first step is to find one relation $S^{(T,\chi)}_{\chi(t)}$ (over variables $\chi(t)$ ) for every bag $t\in V(T)$ of every relaxed tree decomposition $(T,\chi)\in\text{\sf TD}^{\ell}_{\emptyset}$ such that the relations $S^{(T,\chi)}_{\chi(t)}$ together form a solution to the following equation:

[TABLE]

Note that the right-hand side of (45) is a Boolean tensor decomposition of the left-hand side: In particular under the Boolean semiring $(\bm{D},\oplus,\otimes,\bm{0},\bm{1})=(\{\text{\sf true},\text{\sf false}\},\vee,\wedge,\text{\sf false},\text{\sf true})$ , the left-hand side of (45) can be viewed as an $n$ -dimensional tensor where $n=|\mathcal{V}|=|\cup_{K\in\mathcal{E}_{s}}K|$ while the right-hand side is an equivalent sum of a product of tensors. The idea of using Boolean tensor decomposition to speed up query evaluation was used in the context of queries with disequalities [4]. Assuming that we can compute the intermediate relations $S^{(T,\chi)}_{\chi(t)}$ efficiently satisfying (45), then (44) can be answered by answering for each $(T,\chi)\in\text{\sf TD}_{\emptyset}^{\ell}$ an intermediate query:

[TABLE]

The final answer $Q$ is obtained by the Datalog rule:

[TABLE]

The key point here is that each intermediate query (46) is an FAQ-AI query (3) with $\text{\sf faqw}_{\ell}\leq 1$ . 666We can also show here that $\text{\sf faqw}_{\ell}$ is exactly 1 although this is not needed for the proof of Theorem 3.12. In particular, by comparing (20) to (6), we can see that for any query $Q$ , $\text{\sf faqw}_{\ell}(Q)\geq\text{\sf fhtw}(\mathcal{H}_{\not\infty}:=(\mathcal{V},\mathcal{E}_{\not\infty}))$ , and fhtw for any hypergraph is at least $1$ [19]. This is because $Q^{(T,\chi)}()$ admits a relaxed tree decomposition $(T,\chi)$ where each bag $\chi(t)$ for $t\in V(T)$ is covered by one relation $S^{(T,\chi)}_{\chi(t)}\in\mathcal{E}_{\not\infty}$ , hence $\text{\sf faqw}_{\ell}(Q^{(T,\chi)}())\leq\max_{h\in\text{\sf ED}_{\not\infty}\cap\Gamma_{n}}\max_{t\in V(T)}h(\chi(t))\leq 1$ . By Theorem 4 each intermediate query (46) can be answered in time $\tilde{O}(M)$ where

[TABLE]

It remains to show how to compute tables $S^{(T,\chi)}_{\chi(t)}$ that form a solution to (45); to do so, we apply distributivity of $\vee$ over $\wedge$ to rewrite the right-hand side of (45) as follows. Let $\mathcal{M}$ be the collection of all maps $\beta:\text{\sf TD}^{\ell}_{\emptyset}\to 2^{\mathcal{V}}$ such that $\beta(T,\chi)=\chi(t)$ for some $t\in V(T)$ ; in other words, $\beta$ selects one bag $\chi(t)$ out of each tree decomposition $(T,\chi)$ . Then, from the distributive law we have

[TABLE]

which means to solve the relational equation (45) we can instead solve the equation

[TABLE]

To solve the above equation, for each $\beta\in\mathcal{M}$ we can find tables $S^{(T,\chi)}_{\beta(T,\chi)}$ that form a solution to the following equation

[TABLE]

To do that, for each $\beta\in\mathcal{M}$ , we compute a solution to the following disjunctive Datalog rule:

[TABLE]

Once we obtain the relations $W^{(T,\chi)}_{\beta(T,\chi)}$ , we can semijoin-reduce them against the input relations $R_{K}$ (i.e. replace $W^{(T,\chi)}_{\beta(T,\chi)}$ with $W^{(T,\chi)}_{\beta(T,\chi)}\Join R_{K}$ for each input relation $R_{K}$ where $K\subseteq\beta(T,\chi)$ ), in order to obtain $S^{(T,\chi)}_{\beta(T,\chi)}$ that solve (50). Once we obtain those $S^{(T,\chi)}_{\beta(T,\chi)}$ , we plug them in (46) to obtain an FAQ-AI query of the form (3) for each relaxed tree decomposition $(T,\chi)\in\text{\sf TD}^{\ell}_{\emptyset}$ . We use Theorem 4 to solve each one of those queries in time $\tilde{O}(M)$ where $M$ was given by (48). This is the step of the algorithm where the additive inequalities $\bigwedge_{S\in\mathcal{E}_{\ell}}\left[\sum_{v\in S}\theta^{S}_{v}\leq 0\right]$ participate in the computation. Once we obtain the solutions to queries (46), we use (47) to obtain the answer of the original FAQ-AI query.

The only step in the above algorithm that we haven’t specified yet is how to evaluate each disjunctive Datalog rule (52). We do so by running the PANDA algorithm, which computes the rule in time bounded by $\tilde{O}(N^{e(\beta)})$ , where

[TABLE]

Maximizing over $\beta\in\mathcal{M}$ , the runtime is bounded by $\tilde{O}(N^{w})$ , where

[TABLE]

The first equality in (57) follows from the minimax lemma in [8]. Our reasoning above also shows that $M$ from (48) is bounded by $N^{\text{\sf smfw}_{\ell}(Q)}$ . ∎

3.3 Relaxed polymatroids

A key step in the proof of Theorem 3.12 is to find the Boolean tensor decomposition (45) of the product over $R_{K}$ . In a non-Boolean semiring, this becomes a tensor decomposition on this semiring:

[TABLE]

In order to compute this tensor decomposition, we can still follow the script of the proof of Theorem 3.12, working on the parameter space of the input factors $R_{K}$ ; however, for the equality in (58) to hold (it is an identity over the value-space of the factors), it suffices to ensure the following property:

For any $\bm{x}_{\mathcal{V}}$ s.t. $\bigotimes_{K\in\mathcal{E}_{s}}R_{K}(\bm{x}_{K})\neq\bm{0}$ , there is exactly one tree decomposition $(T,\chi)\in\text{\sf TD}^{\ell}_{F}$ for which

[TABLE]

while for the other TDs, the left-hand side above is $\bm{0}$ .

Essentially, the property ensures that we do not have to perform inclusion-exclusion (IE) over the tree decompositions in $\text{\sf TD}_{F}^{\ell}$ .777IE is difficult for two reasons: (1) IE computation explodes the runtime, and (2) in a general semiring there may not be additive inverses and thus IE may not even apply. We do not know how to ensure this property in general. However, under a relaxed notion of polymatroids, the property above holds. Since this idea applies to FAQ queries in general, we start with our result on FAQ queries first, before specializing it to FAQ-AI.

3.3.1 FAQ over an arbitrary semiring

To explain how we can guarantee the property (59) for an FAQ query over an arbitrary semiring, consider the following example. Suppose that we would like to evaluate the (aggregate) query

[TABLE]

We write $R_{ij}$ instead of $R_{ij}(x_{i},x_{j})$ for short. The factors $R_{ij}$ are functions of two variables $R_{ij}:\text{\sf Dom}(X_{i})\times\text{\sf Dom}(X_{j})\to{\mathbb{R}}$ , and they are represented by ternary relations in a database. Abusing notation we will also use $R_{ij}$ to refer to its support, i.e., the binary relation over $(X_{i},X_{j})$ such that $(x_{i},x_{j})\in R_{ij}$ iff $R_{ij}(x_{i},x_{j})\neq\bm{0}$ .

There are only two non-trivial tree decompositions for the “ $4$ -cycle” query (60): one with bags $\{1,2,3\}$ and $\{3,4,1\}$ , and the other with bags $\{1,2,4\}$ and $\{2,3,4\}$ .888The trivial TD with one bag $\{1,2,3,4\}$ can always be replaced by a non-trivial TD in the considered bounds/algorithms without making them any worse. Similarly, redundant TDs can be replaced by non-redundant ones. To evaluate the query, we first solve the relational equation (58), but only on the supports; i.e., we would like to find relations $S_{123},S_{341}$ , $S_{234}$ , and $S_{412}$ such that

[TABLE]

The second $\equiv$ is due to the distributivity of $\vee$ over $\wedge$ . Since the last formula is in CNF, we can solve each term separately by solving $4$ different disjunctive Datalog rules:

[TABLE]

Applying the proof-to-algorithm conversion idea from PANDA [9, 8], the above disjunctive Datalog rules can be solved with the PANDA algorithm. It is beyond the scope of this article to describe the PANDA algorithm in full details. However, we can describe a solution. Let $N=\max\{|R_{12}|,|R_{23}|,$ $|R_{34}|,|R_{41}|\}$ . For each input relation/factor, define their “light” parts as follows.

[TABLE]

Also, for every $R_{ij}$ , define $R^{h}_{ij}:=R_{ij}\setminus R^{\ell}_{ij}$ . Then, one can verify that the following is a solution to the relational equations (62)-(65):

[TABLE]

The above is not yet solution to (61). However we can refine it as follows to obtain such a solution:

[TABLE]

(These extra relations $R_{ij}\Join R_{jk}$ that are joined into $S_{ijk}$ to turn them into a solution to (61) will be referred to as “filters” in the proof of Theorem 3.15 below.) It is straightforward to verify that each $S_{ijk}$ can be computed in $\tilde{O}(N^{1.5})$ -time. However, (61) alone is not enough to guarantee (59). Instead, we now need $S_{ijk}$ to satisfy the following stronger condition (where $\bigvee$$\scriptscriptstyle{+}$ denotes the exclusive OR):

[TABLE]

Luckily, in this particular example, our previous solution for $S_{ijk}$ from (67) happens to be a solution to (68) as well. Once we have the relations $S_{ijk}$ from (67), we can extend them naturally into factors (so that they are represented by $4$ -ary relations) satisfying (59). In particular, as functions with range ${\mathbb{R}}$ , they are defined by

[TABLE]

Finally the query $Q$ from (60) can be computed by taking the sum of two queries:

[TABLE]

The above sketch does not work for a general FAQ query because the relational solution returned by PANDA is not guaranteed to satisfy (59). (If we could do that, then we would have been able to solve $\#\text{\sf CSP}$ queries in submodular width time, but the latter is unlikely to be possible since the submodular width tightly characterizes the hardness of CSP queries [32].) We could however restrict PANDA forcing it to maintain (59) at the cost of weakening the runtime bound achieved by PANDA. In particular, PANDA’s runtime is upperbounded by the submodular (FAQ) width, which is a maximum over some set of polymatroids (See Section 2.2). We will now replace these polymatroids with a superset, called $\mathcal{E}$ -polymatroids, leading to a larger version of the submodular (FAQ) width called “sharp submodular (FAQ) width”. The latter captures the runtime of our new version of PANDA, called #PANDA.

Definition 3.13 ( $\mathcal{E}$ -polymatroids and $\Gamma_{n|\mathcal{E}}$ ).

Given a collection $\mathcal{E}$ of subsets of $\mathcal{V}$ , a set function $h:2^{\mathcal{V}}\to{\mathbb{R}}_{+}$ is said to be a $\mathcal{E}$ -polymatroid if it satisfies the following: (i) $h(\emptyset)=0$ , (ii) $h(X)\leq h(Y)$ whenever $X\subseteq Y$ , and (iii) $h(X\cup Y)+h(X\cap Y)\leq h(X)+h(Y)$ for every pair $X,Y\subseteq\mathcal{V}$ such that $X\cap Y\subseteq S$ for some $S\in\mathcal{E}$ ****.999The underlined part is the only distinction between $\mathcal{E}$ -polymatroids and polymatroids. If we drop it, we get back the original definition of polymatroids. In particular, a $2^{\mathcal{V}}$ -polymatroid is a polymatroid as defined in Section 2.1. For $\mathcal{V}=[n]$ , let $\Gamma_{n|\mathcal{E}}$ denote the set of all $\mathcal{E}$ -polymatroids on $\mathcal{V}$ .

The following definition is a straightforward generalization of smfw from (13), where we replace $\Gamma_{n}$ by the relaxed polymatroids $\Gamma_{n|\mathcal{E}}$ .

Definition 3.14 (#-submodular FAQ-width).

Given an FAQ query (2) whose hypergraph is $\mathcal{H}=(\mathcal{V},\mathcal{E}=\mathcal{E}_{\not\infty}\cup\mathcal{E}_{\infty})$ , its #-submodular FAQ-width, denoted by $\text{\sf\#smfw}(Q)$ , is defined by

[TABLE]

When there are no free variables, i.e., $F=\emptyset$ , we define $\text{\sf\#subw}(Q):=\text{\sf\#smfw}(Q)$ , to mirror the case when $\text{\sf faqw}(Q)=\text{\sf fhtw}(Q)$ .

Under the above new width parameter, we can now maintain condition (59) allowing us to solve FAQ queries over any semiring:

Theorem 3.15.

Any FAQ query $Q$ of the form (2) on any semiring can be answered in time $\tilde{O}(N^{\text{\sf\#smfw}(Q)}+|Q|)$ .

The proof of Theorem 3.15 involves an appropriate adaptation of PANDA called #PANDA, to be described below. Appendix D presents an overview of the original PANDA algorithm. Readers unfamiliar with PANDA are recommended to read that appendix first before reading the following proof.

Proof.

The PANDA algorithm [9, 8] takes as input a disjunctive Datalog query of the form

[TABLE]

The above query has an input relation $R_{K}$ for each hyperedge $K\in\mathcal{E}$ in the query’s hypergraph $\mathcal{H}=(\mathcal{V},\mathcal{E})$ . The output to the above query is a collection of tables $G_{B}$ , one for each “goal” (or “target”) $B$ in the collection of goals $\mathcal{B}$ . The output tables $(G_{B})_{B\in\mathcal{B}}$ must satisfy the logical implication in (72): In particular, for each tuple $\bm{x}_{\mathcal{V}}$ that satisfies the conjunction $\bigwedge_{K\in\mathcal{E}}R_{K}(\bm{x}_{K})$ , the disjunction $\bigvee_{B\in\mathcal{B}}G_{B}(\bm{x}_{B})$ must hold. Query (37) is an example of (72). A disjunctive Datalog query (72) can have many valid outputs. The PANDA algorithm computes one such output in time $\tilde{O}(N^{e})$ , where

[TABLE]

(Recall notation from Section 2.2.)

In what follows, we describe a variant of PANDA, called #PANDA, that takes a disjunctive Datalog query (72), and computes the following:

•

A collection of tables $(G_{B})_{B\in\mathcal{B}}$ that form a valid output to query (72), i.e. that satisfy the logical implication in (72).

•

Moreover, associated with each output table $G_{B}$ , #PANDA additionally computes a collection of “filter” tables $\left(F_{K}^{(B)}\right)_{K\in\mathcal{E}}$ , one table $F_{K}^{(B)}$ for each hyperedge $K\in\mathcal{E}$ in the input hypergraph $\mathcal{H}$ . The output tables $G_{B}$ along with the associated filters $\left(F_{K}^{(B)}\right)_{K\in\mathcal{E}}$ satisfy the following condition: For each tuple $\bm{x}_{\mathcal{V}}$ that satisfies the conjunction $\bigwedge_{K\in\mathcal{E}}R_{K}(\bm{x}_{K})$ , there is exactly one target $B\in\mathcal{B}$ where the conjunction $\displaystyle{\bigwedge_{K\in\mathcal{E}}F_{K}^{(B)}(\bm{x}_{K})}$ holds, and for that target $B$ , $G_{B}(\bm{x}_{B})$ holds as well. In particular, the following equivalences hold:

[TABLE]

where $\bigvee$$\scriptscriptstyle{+}$ above denotes the exclusive OR. Equations (74) and (75) together imply

[TABLE]

Comparing the above to (72), note that the purpose of the filters $F_{K}^{(B)}$ is to keep the goals $G_{B}(\bm{x}_{B})$ disjoint from one another allowing us to replace $\vee$ with $\bigvee$$\scriptscriptstyle{+}$ and ultimately maintain condition (58). (As we will see later, in #PANDA, we start with filters $F_{K}^{(B)}$ that are identical to the corresponding input relations $R_{K}$ , and we keep removing tuples from $F_{K}^{(B)}$ to maintain (74) and (75) throughout the algorithm.)

#PANDA computes the above output tables $(G_{B})_{B\in\mathcal{B}}$ and $\left(\left(F_{K}^{(B)}\right)_{K\in\mathcal{E}}\right)_{B\in\mathcal{B}}$ in time $\tilde{O}(N^{e^{\prime}})$ where

[TABLE]

Now we briefly explain how to tweak the PANDA algorithm into #PANDA satisfying the above characteristics. We refer the reader to Appendix D and [9, 8] for more details about PANDA. At a high level, the PANDA algorithm starts with proving an exact upperbound on $e$ from (73) using a sequence of proof steps, called the proof sequence (see Lemmas 135, D.3, and D.5). Then PANDA interprets each step in the proof sequence as a relational operator, and then uses this sequence of relational operators as a query plan to actually compute the query in time $\tilde{O}(N^{e})$ . One of the proof steps used in PANDA is the decomposition step $h(Y)\rightarrow h(X)+h(Y|X)$ for some $X\subseteq Y\subseteq\mathcal{V}$ . The relational operator corresponding to this decomposition step is the “partitioning” operator, in which we take an input (or intermediate) table $R_{Y}$ and partition it into a small number $k=O(\log|R_{Y}|)$ of tables $R_{Y}^{(1)},\ldots,R_{Y}^{(k)}$ , based on the degrees of variables in $Y$ with respect to variables in $X\subseteq Y$ . In particular, define the degree of $Y$ w.r.t. a tuple $\bm{t}_{X}\in\pi_{X}R_{Y}$ and w.r.t. to $X$ as follows:

[TABLE]

In the partitioning step, we partition tuples $\bm{t}_{X}\in\pi_{X}R_{Y}$ into $k$ buckets based on $\deg_{R_{Y}}(Y|\bm{t}_{X})$ and partition $R_{Y}$ accordingly. Specifically, for each $j\in[k]$ , we define

[TABLE]

After partitioning, PANDA creates $k$ independent branches of the problem, where in the $j$ -th branch, $R_{Y}$ is replaced by both $R_{X}^{(j)}$ and $R_{Y}^{(j)}$ . Note that for each $j\in[k]$ , the following holds:

[TABLE]

The above inequality mirrors the proof step $h(Y)\rightarrow h(X)+h(Y|X)$ exemplifying the way the entire PANDA algorithm mirrors the proof sequence of the bound in (73) allowing its runtime to be bounded by (73) (see [9, 8] for more details). After each partitioning step, PANDA continues on each one of the $k$ branches of the problem independently and ends up computing a potentially different target $G_{B}$ for some $B\in\mathcal{B}$ within each branch.

From the proof sequence construction described in [9, 8], we note the following: If the constructed proof sequence that is used to prove the bound on $e$ in (73) contains a decomposition step $h(Y)\rightarrow h(X)+h(Y|X)$ , then the proof of the bound on $e$ must have relied on some submodularity constraint on $h$ of the form $h(X)+h(Y\cup Z)\leq h(Y)+h(X\cup Z)$ for some $Z\subseteq\mathcal{V}$ where $Y\cap Z=\emptyset$ . In particular, such a submodularity can be broken down into the sum of two inequalities:

[TABLE]

which in turn are converted into two proof steps in the proof sequence:

[TABLE]

Moreover, the above is the only place in the proof sequence construction [9, 8] where a decomposition step (85) is introduced. However, the new bound (77) used in #PANDA only relies on submodularities $h(X)+h(Y\cup Z)\leq h(Y)+h(X\cup Z)$ where $X\subseteq K$ for some $K\in\mathcal{E}$ . (Recall $\Gamma_{n|\mathcal{E}_{\not\infty}}$ from Definition 9.) Therefore, in #PANDA, whenever we apply a partitioning step of $R_{Y}$ into $R_{Y}^{(1)},\ldots,R_{Y}^{(k)}$ based on the degrees $\deg_{R_{Y}}(Y|\bm{t}_{X})$ of $\bm{t}_{X}\in\pi_{X}R_{Y}$ , we know that there is some input relation $R_{K}$ with $X\subseteq K$ . Therefore we can refine the corresponding filter $F_{K}^{(B)}$ by semijoining it with $R_{X}^{(j)}=\pi_{X}R_{Y}^{(j)}$ on the $j$ -th branch, i.e. by taking $F_{K}^{(B)}\leftarrow F_{K}^{(B)}\ltimes R_{X}^{(j)}$ . Moreover, this update of filters $F_{K}^{(B)}$ maintains (74) and (75). (Initially, we start with filters $F_{K}^{(B)}$ that are identical to the corresponding input relations $R_{K}$ , which trivially satisfy both (74) and (75).)

Now that we have described the #PANDA algorithm satisfying the above properties, we explain how to use it as a blackbox to solve an FAQ query $Q$ of the form (2) in time $\tilde{O}(N^{\text{\sf\#smfw}(Q)}+|Q|)$ . Following the same notation as in the proof of Theorem 3.12, let $\mathcal{M}$ be the collection of all maps $\beta:\text{\sf TD}^{\ell}_{F}\to 2^{\mathcal{V}}$ such that $\beta(T,\chi)=\chi(t)$ for some $t\in V(T)$ ; in other words, $\beta$ selects one bag $\chi(t)$ out of each tree decomposition $(T,\chi)$ . Let ${\mathbf{B}}$ be the collection of images of all $\beta\in\mathcal{M}$ , i.e.

[TABLE]

For each $\mathcal{B}\in{\mathbf{B}}$ , we use #PANDA to solve the following rule (i.e. to produce relations $G_{B}$ and $F_{K}^{(B)}$ that satisfy the equivalence):

[TABLE]

The solutions collectively satisfy the following:

[TABLE]

Let $M=|{\mathbf{B}}|$ and suppose ${\mathbf{B}}=\{\mathcal{B}_{1},\mathcal{B}_{2},\ldots,\mathcal{B}_{M}\}$ . By distributing the conjunction $\bigwedge_{\mathcal{B}\in{\mathbf{B}}}$ over $\bigvee$$\scriptscriptstyle{+}$ , we get

[TABLE]

Using the same diagonalization argument from [9, 8], we can prove the following claim:

Claim 2.

For every $(B_{1},\ldots,B_{M})\in\prod_{i=1}^{M}\mathcal{B}_{i}$ , there must exist a tree decomposition $(\bar{T},\bar{\chi})\in\text{\sf TD}^{\ell}_{F}$ such that for every $t\in V(\bar{T})$ , $\bar{\chi}(t)=B_{j}$ for some $j\in[M]$ .

Assuming Claim 2 is correct, and thanks to (75), we can rewrite the conjunction as

[TABLE]

The right-hand side of (89) is an FAQ query. We solve it by running InsideOut over the tree decomposition $(\bar{T},\bar{\chi})$ . We repeat the above for every $(B_{1},\ldots,B_{M})\in\prod_{i=1}^{M}\mathcal{B}_{i}$ . Afterwards, because we have an exclusive OR over $(B_{1},\ldots,B_{M})\in\prod_{i=1}^{M}\mathcal{B}_{i}$ , we can simply sum up corresponding query results.

From (77), the total runtime is $\tilde{O}(N^{w}+|Q|)$ , where

[TABLE]

Finally we include the proof of Claim 2 for completeness, following the corresponding proof in [9, 8]. Consider a fixed $(B_{1},\ldots,B_{M})\in\prod_{i=1}^{M}\mathcal{B}_{i}$ . Assume to the contrary that for every tree decomposition $(T,\chi)\in\text{\sf TD}^{\ell}_{F}$ , there is some bag $\bar{\beta}(T,\chi):=\chi(t)$ for some $t\in V(T)$ such that $\chi(t)\notin\{B_{1},\ldots,B_{M}\}$ . By definition of ${\mathbf{B}}$ , $\text{\sf image}(\bar{\beta})=\mathcal{B}_{j}$ for some $j\in[M]$ . Therefore, $B_{j}=\bar{\beta}(T,\chi)$ for some $(T,\chi)\in\text{\sf TD}^{\ell}_{F}$ . But this contradicts the claim that for every tree decomposition $(T,\chi)\in\text{\sf TD}^{\ell}_{F}$ , $\bar{\beta}(T,\chi)\notin\{B_{1},\ldots,B_{M}\}$ . ∎

The following proposition shows that while $\text{\sf\#smfw}(Q)$ can be larger than $\text{\sf smfw}(Q)$ , it is not larger than $\text{\sf faqw}(Q)$ and can be unboundedly smaller for classes of queries.

Proposition 3.16 (Connecting #smfw to smfw and faqw).

(a)

For any FAQ query $Q$ , the following holds:

[TABLE]

In particular, when $Q$ has no free variables, we have

[TABLE] 2. (b)

Furthermore, there are classes of queries $Q$ for which the gap between $\text{\sf\#smfw}(Q)$ and $\text{\sf faqw}(Q)$ is unbounded, and so is the gap between $\text{\sf\#subw}(Q)$ and $\text{\sf fhtw}(Q)$ .

Proof.

First we prove part (a). The first inequality in (90) follows directly from the definitions of #smfw and smfw along with the fact that $\Gamma_{n}\subseteq\Gamma_{n|\mathcal{E}_{\not\infty}}$ . To prove the second inequality in (90), we use the following variant of the Modularization Lemma from [8]:

Claim 3 (Variant of the Modularization Lemma [8]).

Given a hypergraph $\mathcal{H}=(\mathcal{V}=[n],\mathcal{E})$ and a set $B\subseteq\mathcal{V}$ , we have

[TABLE]

*where ED is given by (5) and $\text{\sf M}_{n}$ denotes the set of all modular functions $h:2^{\mathcal{V}}\rightarrow{\mathbb{R}}_{+}$ . (A function $h:2^{\mathcal{V}}\rightarrow{\mathbb{R}}_{+}$ is modular if $h(X)=\sum_{i\in X}h(i),\forall X\subseteq\mathcal{V}$ .) *

Proof of Claim 92.

Obviously, the LHS of (92) is lowerbounded by the RHS. Next, we prove LHS $\leq$ RHS. W.L.O.G. we assume $B=[k]$ for some $k\in[n]$ . Let $h^{*}=\operatorname*{arg\,max}_{h\in\text{\sf ED}\cap\Gamma_{n|\mathcal{E}}}h(B)$ . Define a function $\bar{h}:2^{\mathcal{V}}\rightarrow{\mathbb{R}}_{+}$ as follows:

[TABLE]

Obviously $\bar{h}\in\text{\sf M}_{n}$ and $\bar{h}(B)=h^{*}(B)$ . Next, we prove $\bar{h}\in\text{\sf ED}$ by proving that for every $F\subseteq[n]$ where $F\subseteq E$ for some $E\in\mathcal{E}$ , the following holds: $\bar{h}(F)\leq h^{*}(F)$ .

The proof is by induction on $|F|$ . The base case when $|F|=0$ is trivial. For the inductive step, consider some $F$ where $F\subseteq E$ for some $E\in\mathcal{E}$ . Let $j$ be the maximum integer in $F$ , then by noting that $|F\cap[j-1]|<|F|$ , we have

[TABLE]

The first inequality above is by induction hypothesis, and the second inequality follows from the fact that $h^{*}$ is a $\mathcal{E}$ -polymatroid (recall Definition 9). Both steps rely on the fact that $F\cap[j-1]\subseteq E$ for some $E\in\mathcal{E}$ . Consequently, $\bar{h}\in\text{\sf ED}\cap\text{\sf M}_{n}$ . Since $\bar{h}(B)=h^{*}(B)$ , this proves Claim 92. ∎

Now we prove the second inequality in (90):

[TABLE]

The fact that $\max_{h\in\text{\sf ED}_{\not\infty}\cap\text{\sf M}_{n}}h(\chi(t))=\rho^{*}_{\mathcal{E}_{\not\infty}}(\chi(t))$ follows from the two sides being dual linear programs. (Recall the definition of $\rho^{*}$ from Section 2.1.)

Now, we prove part (b) of Proposition 3.16. In [8], we constructed a class of graphs/queries where the gap between fhtw and subw is unbounded. We will re-use the same construction here and prove that the upperbound on subw that we proved in [8] is also an upperbound on #subw. The upperbound proof is going to be different from [8] though since here we can only use $\mathcal{E}$ -polymatroid properties to prove the bound (recall Definition 9).

Given integers $m$ and $k$ , consider a graph $\mathcal{H}=(\mathcal{V},\mathcal{E})$ which is an “ $m$ -fold $2k$ -cycle”: The vertex set $\mathcal{V}:=I_{1}\cup\ldots\cup I_{2k}$ is a disjoint union of $2k$ -sets of vertices. Each set $I_{j}$ has $m$ vertices in it, i.e., $I_{j}:=\{I_{j}^{1},I_{j}^{2},\ldots,I_{j}^{m}\}$ . There is no edge between any two vertices within the set $I_{j}$ for every $j\in[2k]$ , i.e., $I_{j}$ is an independent set. The edge set $\mathcal{E}$ of the hypergraph is the union of $2k$ complete bipartite graphs $K_{m,m}$ :

[TABLE]

Finally consider an FAQ query $Q$ that has a finite-sized input factor $R_{K}$ for every $K\in\mathcal{E}$ , i.e., $\mathcal{E}_{\not\infty}=\mathcal{E}$ and $\mathcal{E}_{\infty}=\emptyset$ (recall notation from Section 2.2). Assuming $Q$ has no free variables, then $\text{\sf faqw}(Q)=\text{\sf fhtw}(Q)$ and $\text{\sf\#smfw}(Q)=\text{\sf\#subw}(Q)$ .

We proved in [8] that $\text{\sf fhtw}(Q)\geq 2m$ . Next we prove that $\text{\sf\#subw}(Q)$ $\leq m(2-1/k)$ . Let $h$ be any function in $\text{\sf ED}_{\not\infty}\cap\Gamma_{n|\mathcal{E}_{\not\infty}}$ . We recognize two cases:

•

Case 1: $h(I_{i})\leq\theta$ for some $i\in[2k]$ . WLOG assume $h(I_{1})\leq\theta$ . Consider the TD

$I_{1}\cup I_{2}\cup I_{3}$$I_{1}\cup I_{3}\cup I_{4}$$I_{1}\cup I_{2k-1}\cup I_{2k}$

For bag $B=I_{1}\cup I_{i}\cup I_{i+1}$ , using $\mathcal{E}_{\not\infty}$ -polymatroid properties (Definition 9), we have

[TABLE]

•

Case 2: $h(I_{i})>\theta$ for all $i\in[2k]$ . Consider the TD

$I_{1}\cup I_{2}\cup\cdots\cup I_{k+1}$$I_{k+1}\cup I_{k+2}\cup\cdots\cup I_{2k}\cup I_{1}$ Bag $B_{1}$ Bag $B_{2}$

For convenience, given any vertex $I_{i}^{j}$ , define the vertex set $\mathcal{V}_{i}^{j}$ as follows:

[TABLE]

From $\mathcal{E}_{\not\infty}$ -polymatroid properties, we have

[TABLE]

In a symmetric way, we can also show that $h(B_{2})\leq km-(k-1)\theta$ . By setting $\theta=(1-1/k)m$ , we prove that $\text{\sf\#subw}(Q)\leq m(2-1/k)$ . Since $\text{\sf fhtw}(Q)\geq 2m$ , this proves part (b). ∎

Example 3.17.

Consider again the count query $Q$ in (60), which we showed earlier how to compute in time $\tilde{O}(N^{1.5})$ . Since $Q$ has no free variables, $\text{\sf faqw}(Q)=\text{\sf fhtw}(Q)=2$ and $\text{\sf\#smfw}(Q)=\text{\sf\#subw}(Q)$ . In the proof of Proposition 3.16, we show that $\text{\sf\#subw}(Q)\leq 1.5$ . Therefore, the #PANDA algorithm from the proof of Theorem 3.15 can compute (60) in time $\tilde{O}(N^{1.5})$ . In fact, the $\tilde{O}(N^{1.5})$ algorithm we described earlier for (60) is just a specialization of #PANDA. The proof of Proposition 3.16 offers a family of similar examples.∎

3.3.2 FAQ-AI over an arbitrary semiring

Finally, we put everything together to solve the FAQ-AI problem. The only (very natural) change is to replace the tree decompositions by their relaxed version, and the technical details flow through.

Definition 3.18.

Given an FAQ-AI query (3) whose hypergraph is $\mathcal{H}=(\mathcal{V},\mathcal{E}=\mathcal{E}_{s}\cup\mathcal{E}_{\ell}=\mathcal{E}_{\not\infty}\cup\mathcal{E}_{\infty})$ , its relaxed #-submodular FAQ-width, denoted by $\text{\sf\#smfw}_{\ell}(Q)$ , is defined by

[TABLE]

When $F=\emptyset$ , we define $\text{\sf\#subw}_{\ell}(Q):=\text{\sf\#smfw}_{\ell}(Q)$ .

Theorem 3.19.

Any FAQ-AI query $Q$ of the form (3) on any semiring can be computed in time $\tilde{O}(N^{\text{\sf\#smfw}_{\ell}(Q)}+|Q|)$ .

The proof of the above theorem is very similar to that of Theorem 3.15. The key difference is that instead of running InsideOut on individual FAQ queries obtained after applying #PANDA, we now run the InsideOut variant from Theorem 4. The proof is thus omitted.

Example 3.20.

Consider the following count query (which is similar to the counting version of query $Q_{2}$ from Example 1.2):

[TABLE]

Let $N:=\max\{|R|,|S|,|T|\}$ . For the above query $\text{\sf faqw}(Q)=\text{\sf faqw}_{\ell}(Q)$ $=\text{\sf\#smfw}(Q)=2$ . Any of the previously known algorithms, including the one from Theorem 4 and the one from Theorem 3.15, would need time $O(N^{2})$ to compute $Q$ . We show below that $\text{\sf\#smfw}_{\ell}(Q)\leq 1.5$ . As an example of Theorem 3.19, we also show how to compute the above query in $\tilde{O}(N^{1.5})$ . (Using the same method, we can also solve the counting version of $Q_{2}$ from Example 1.2 in the same time.)

First we prove that for the above query, $\text{\sf\#smfw}_{\ell}(Q)\leq 1.5$ . Here $F=\emptyset$ . We will use two relaxed tree decompositions in $\text{\sf TD}_{F}^{\ell}$ : The first $(T_{1},\chi_{1})$ has two bags $\{a,b,c\}$ and $\{c,d\}$ . The second $(T_{2},\chi_{2})$ has two bags $\{a,b\}$ and $\{b,c,d\}$ . (Both are relaxed TDs because the ligament edge $\bm{1}_{a+b+c+d\leq 0}$ is not contained in any bag; recall Definition 3.3.) Following (93), for each $h\in\text{\sf ED}_{\not\infty}\cap\Gamma_{n|\mathcal{E}_{\not\infty}}$ , we will pick one TD or the other. In particular, given some $h\in\text{\sf ED}_{\not\infty}\cap\Gamma_{n|\mathcal{E}_{\not\infty}}$ :

•

If $h(b)\geq 1/2$ , then $h(bc|b)\leq 1/2$ . We pick $(T_{1},\chi_{1})$ . From $\mathcal{E}_{\not\infty}$ -polymatroid properties (Def. 9), we have

[TABLE]

•

If $h(b)<1/2$ , we pick $(T_{2},\chi_{2})$ .

[TABLE]

This proves that $\text{\sf\#smfw}_{\ell}(Q)\leq 1.5$ .

Finally, as a special case of #PANDA, we explain how to solve the above query in time $\tilde{O}(N^{1.5})$ (where recall $N:=\max\{|R|,|S|,|T|\}$ ). Let

[TABLE]

Now we can write

[TABLE]

Both $U$ and $W$ above have sizes $\leq N^{1.5}$ . Using the algorithm from the proof of Theorem 4, $Q^{\ell}$ can be answered in time $O(N^{1.5}\log N)$ using the relaxed TD $(T_{1},\chi_{1})$ , while $Q^{h}$ can be answered in the same time using $(T_{2},\chi_{2})$ . ∎

4 Applications to relational Machine Learning

Our FAQ-AI formalism and solution are directly applicable to learning a class of machine learning models, which includes supervised models (e.g., robust regression, SVM classification), and unsupervised models (e.g., clustering via $k$ -means). In this section, we show that the core computation of these optimization problems can be formulated in FAQ-AI over the sum-product semiring.

4.1 Training ML models over databases

A typical machine learning model is learned over a training dataset $\bm{G}$ . We consider the common scenario where the input data is a relational database $\bm{I}$ , and the training dataset $\bm{G}$ is the result of a feature extraction join query $Q$ over $\bm{I}$ [38, 2, 3, 29, 22]. Each tuple $(\bm{x},y)\in\bm{G}$ consists of a vector of features $\bm{x}$ of length $n$ and a label $y$ . We consider that the feature extraction query $Q$ has the hypergraph $\mathcal{H}=(\mathcal{V},\mathcal{E}_{s})$ , where $\mathcal{E}_{s}$ is the set of its skeleton hyperedges.

A supervised machine learning model is a function $f_{\bm{\beta}}(\bm{x})$ with parameters $\bm{\beta}$ that is used to predict the label $y$ for unlabeled data. The parameters are obtained by minimizing the objective function:

[TABLE]

where $\mathcal{L}(a,b)$ is a loss function, $\Omega$ is a regularizer, e.g., $\ell_{1}$ or $\ell_{2}$ norm, and the constant $\lambda\in(0,1)$ controls the influence of regularization.

Previous work has shown that for polynomial loss functions, such as square loss $\mathcal{L}(a,b)=(a-b)^{2}$ , the core computation for optimizing the objective $J(\bm{\beta})$ amounts to FAQ evaluation [2]. In many instances, however, the loss function is non-polynomial, either due to the structure of the loss, or the presence of non-polynomial components embedded within the model structure (e.g., ReLU activation function in neural nets) [33].

Examples of commonly used non-polynomial loss functions are: (1) hinge loss, used to learn classification models like linear support vector machines (SVM) [33], or generalized low rank models (glrm) with boolean principal component analysis (PCA) [42]; (2) Huber loss, used to learn regression models that are robust to outliers [33]; (3) scalene loss, used to learn quantile regression models [42]; (4) epsilon insensitive loss, used to learn SVM regression models [33]; and (5) ordinal hinge loss, used to learn ordinal regression models or ordinal PCA (another glrm) [42].

Any optimization problem with the above non-polynomial loss functions can benefit from our evaluation algorithm for FAQ-AI by reformulating computations in the optimization algorithm as FAQ-AI expressions over the feature extraction join query $Q$ . We next exemplify this reformulation for the following problems:

•

Learning a robust linear regression model using Huber loss, which can be solved with gradient-descent optimization

•

Learning a linear regression model using the scalene, epsilon insensitive, and ordinal hinge loss functions.

•

Learning a linear support vector machine (SVM) for binary classification using hinge loss, which can be solved with subgradient-based optimization algorithms or with a cutting-plane algorithm for the primal formulation of linear SVM classification.

•

We also consider $k$ -means unsupervised clustering and give an FAQ-AI reformulation of the computation done in an iteration of the algorithm over the dataset $\bm{G}$ .

The advantage of FAQ-AI reformulation is that the FAQ-AI expressions for the aforementioned optimization problems can be evaluated over relaxed tree decompositions of the feature extraction query $Q$ and do not require the explicit materialization of its result $\bm{G}$ . The size of and time to compute $\bm{G}$ is $\tilde{O}(|I|^{\rho^{*}(Q)})$ [35]. The solution to these optimization problems can be computed in time sub-linear in the size of $\bm{G}$ , using InsideOut or #PANDA.

4.2 Background: Gradient-based Optimization

In this section, we overview gradient-based optimization algorithms for convex and differentiable objective functions of the form (95). A gradient-based optimization algorithm employs the first-order gradient information to optimize $J(\bm{\beta})$ . It repeatedly updates the parameters $\bm{\beta}$ by some step size $\alpha$ in the direction of the gradient $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ until convergence. To guarantee convergence, it is common to use backtracking line search to ensure that the step size $\alpha$ is sufficiently small to decrease the loss for each step. Each update step requires two computations: (1) Point evaluation: Given $\bm{\theta}$ , compute the scalar $J(\bm{\theta})$ ; and (2) Gradient computation: Given $\bm{\theta}$ , compute the vector $\mbox{\boldmath$ \nabla $}J(\bm{\theta})$ .

There exist several variants of gradient descent algorithms, e.g., batch gradient descent or stochastic gradient descent, as well as many different algorithms to choose a valid step size [33]. For this work, we consider the batch gradient descent (BGD) algorithm with the Armijo backtracking line search condition, as depicted in Algorithm 1. A common choice for setting the step size is a function that is inversely related to number of iterations of the algorithm, for instance $\alpha=\frac{1}{\lambda t}$ at iteration $t$ , where $\lambda$ is the regularization parameter from (95) [40].

4.3 Robust linear regression with Huber loss

A linear regression model is a linear function $f_{\bm{\beta}}(\bm{x})=\bm{\beta}^{\top}\bm{x}=\sum_{i\in[n]}\beta_{i}x_{i}$ with features $\bm{x}=(x_{1}=1,x_{2},\ldots,x_{n})$ and parameters $\bm{\beta}=(\beta_{1},\ldots,\beta_{n})$ . For a given feature vector $\bm{x}$ , the model is used to estimate the (continuous) label $y\in{\mathbb{R}}$ . We learn the model parameters by minimizing the objective $J(\bm{\beta})$ with the Huber loss function, which is defined as:

[TABLE]

Huber loss is equivalent to the square loss when $|a-b|\leq 1$ and to the absolute loss otherwise101010Without loss of generality, we use a simplified Huber loss. The threshold between absolute and square loss is given by a constant $\delta$ and the absolute loss is $\frac{\delta}{2}|a-b|-\frac{\delta^{2}}{2}$ .. In contrast to the absolute loss, Huber loss is differentiable at all points. It is also more robust to outliers than the square loss.

To learn the parameters, we use batch gradient-descent optimization, which repeatedly updates the parameters in the direction of the gradient $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ until convergence. We provide details on gradient-based optimization in Section 4.2. In this section, we focus on the core computation of the algorithm, which is the repeated computation of the objective $J(\bm{\beta})$ and its gradient $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ .

The gradient $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ is the vector of partial derivatives with respect to parameters $(\beta_{j})_{j\in[n]}$ . (Note that the derivative of $\bm{1}_{\theta(x)\geq 0}$ with respect to $x$ for any function $\theta$ of $x$ is always [math] whenever it is defined.) The objective function $J(\bm{\beta})$ (with $\ell_{2}$ regularization) and its partial derivative with respect to $\beta_{j}$ are:

[TABLE]

Our observation is that we can compute $J(\bm{\beta})$ and $\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}$ without materializing $\bm{G}$ , by reformulating their data-dependent computation as a few FAQ-AI expressions. We explain the details next.

4.3.1 Reformulating the objective $J(\bm{\beta})$ with Huber loss into FAQ-AI expressions

We show that the objective $J(\bm{\beta})$ from (97) can be reformulated into $O(n^{2})$ FAQ-AI expressions of the form (3).

First, we consider the case where $|y-f_{\bm{\beta}}(\bm{x})|\leq 1$ , i.e. the square loss term of $J(\bm{\beta})$ . For ease of notation, let $c_{1}(y,\bm{x})=|y-f_{\bm{\beta}}(\bm{x})|\leq 1$ .

[TABLE]

Each summation over the training dataset $\bm{G}$ in the final reformulation above can be expressed as one FAQ-AI query with two ligament hyperedges. For instance, the first summation over $\bm{G}$ is equivalent to the following FAQ-AI expression:

[TABLE]

The absolute loss function for the case $|y-f_{\bm{\beta}}(\bm{x})|>1$ can be reformulated similarly:

[TABLE]

All of these terms can be reformulated as $O(n)$ FAQ-AI expressions of the form (3).

Overall, the objective $J(\bm{\beta})$ with Huber loss for learning robust linear regression models can be computed with $O(n^{2})$ FAQ-AI expressions, and without materializing the training dataset $\bm{G}$ . Section 4.3.2 shows that the same holds for $\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}$ .

4.3.2 Reformulating the gradient $\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}$ with Huber loss into FAQ-AI expressions

We rewrite the first of the three summations in $\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}$ from (98) as follows:

[TABLE]

The four terms can be expressed as $O(n)$ FAQ-AI expressions of the form (3). For instance, the first part of the expression is equivalent to the following FAQ-AI query:

[TABLE]

The other two summations in $\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}$ both aggregate over $x_{j}$ and have one inequality that defines a ligament in $\mathcal{E}_{\ell}$ . They can be expressed as FAQ-AI expressions. Overall, the gradient $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ can be expressed as $O(n^{2})$ FAQ-AI expressions.

Definition 4.1 ( $Q_{\ell}$ : The ligament extension of $Q$ ).

Given an FAQ query $Q$ with hypergraph $\mathcal{H}=(\mathcal{V},\mathcal{E})$ , define the ligament extension of $Q$ , denoted by $Q_{\ell}$ , to be an FAQ-AI query with hypergraph $\mathcal{H}_{\ell}=(\mathcal{V},\mathcal{E}_{s}\cup\mathcal{E}_{\ell})$ whose set of skeleton edges $\mathcal{E}_{s}$ is identical to $\mathcal{E}$ and whose set of ligament edges $\mathcal{E}_{\ell}$ contains a single ligament edge $\mathcal{V}$ , i.e. $\mathcal{E}_{s}=\mathcal{E}$ and $\mathcal{E}_{\ell}=\{\mathcal{V}\}$ .

Theorem 4.2.

Let $\bm{I}$ be an input database where $N$ is the largest relation in $\bm{I}$ , and $Q$ be a feature extraction query. For any robust linear regression model $\bm{\beta}^{\top}\bm{x}$ , the objective $J(\bm{\beta})$ and gradient $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ with Huber loss can be computed in time $\tilde{O}(N^{\text{\sf\#smfw}_{\ell}(Q_{\ell})})$ with #PANDA and in time $O(N^{\text{\sf faqw}_{\ell}(Q_{\ell})}\log N)$ with InsideOut, where $Q_{\ell}$ is the ligament extension of $Q$ (Def. 4.1).

Proof.

Let $n$ be the number of variables in $Q$ . We show in Sections 4.3.1 and 4.3.2 that we can rewrite objective $J(\bm{\beta})$ and the gradient $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ into $O(n^{2})$ FAQ-AI expressions with at most $|\mathcal{E}_{\ell}|=2$ ligament hyperedges. The overall runtime bound for computing $J(\bm{\beta})$ and $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ with #PANDA follows from Theorem 3.19, which states that #PANDA can compute each FAQ-AI expression in time $\tilde{O}(N^{\text{\sf\#smfw}_{\ell}(Q_{\ell})})$ .

The overall runtime bound for computing $J(\bm{\beta})$ and $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ with InsideOut follows from Theorem 4, which states that InsideOut can compute each FAQ-AI expression in time $O(N^{\text{\sf faqw}_{\ell}(Q_{\ell})}\log N)$ . ∎

4.4 Further non-polynomial loss functions

In this section, we overview the following non-polynomial loss functions: (1) epsilon insensitive loss; (2) ordinal hinge loss; and (3) scalene loss. For each function, we define the loss function $\mathcal{L}$ , the corresponding objective function $J(\bm{\beta})$ , and the partial (sub)derivative $\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}$ which is used in (sub)gradient-based optimization algorithms. (Recall notation from Section 4.1.) In the derivations for the objective $J(\bm{\beta})$ , we will focus on the loss function and ignore the regularizer for better readability.

As in the previous section, the objective and (sub)derivative can be reformulated into several FAQ-AI expressions of the form (3). Instead of writing out the expressions explicitly, we annotate those terms that can be reformulated. The actual reformulation should be clear from the examples in the previous sections.

Epsilon insensitive loss

The epsilon insensitive loss function [33] is defined as:

[TABLE]

This loss function is used to learn SVM regression models. We consider a linear regression model $f_{\bm{\beta}}(\bm{x})=\bm{\beta}^{\top}\bm{x}=\sum_{i\in[n]}\beta_{i}x_{i}$ . The objective function and the corresponding partial subderivative with respect to $\beta_{j}$ are given by:

[TABLE]

The objective can thus be reformulated into $O(1)$ FAQ-AI queries, while the gradient can be reformulated into $O(n)$ queries: one for each $\beta_{j}$ for $j\in[n]$ .

Ordinal hinge loss

The ordinal hinge loss [42] is defined as:

[TABLE]

The loss function is used to learn ordinal regression models or ordinal PCA [42]. A linear ordinal regression model is the linear function $f_{\bm{\beta}}(\bm{x})=\bm{\beta}^{\top}\bm{x}$ which predicts an ordinal label $y\in[d]$ . The objective function and the partial subderivative with respect to $\beta_{j}$ are given by:

[TABLE]

The objective and partial subderivative can thus be reformulated as $O(d\cdot n)$ FAQ-AI expressions.

Scalene loss

The scalene loss function [42] is defined as:

[TABLE]

where $\alpha\in(0,1)$ is a constant.

The loss function is used to learn quantile regression models. We again consider a linear regression model $f_{\bm{\beta}}(\bm{x})=\bm{\beta}^{\top}\bm{x}$ . The objective function and the partial subderivative with respect to $\beta_{j}$ are given by:

[TABLE]

The objective and partial subderivative can thus be reformulated as $O(n)$ FAQ-AI expressions.

Overall, we can reformulate the (sub)gradients under each one of the loss functions discussed in this section as FAQ-AI queries that are ligament extensions of the feature extraction query $Q$ as per Def. 4.1.

4.5 Linear support vector machines

A linear SVM classification model is used for binary classification problems where the label $y\in\{\pm 1\}$ . For the features $\bm{x}=(x_{1}=1,x_{2},\ldots,x_{n})$ , the model learns the parameters $\bm{\beta}=(\beta_{1},\ldots,\beta_{n})$ of a linear discriminant function $f_{\bm{\beta}}(\bm{x})=\bm{\beta}^{\top}\bm{x}$ such that $f_{\bm{\beta}}(\bm{x})$ separates the data points in $\bm{G}$ into positive and negative classes with a maximum margin. The parameters can be learned by minimizing the objective function (95) with the hinge loss function:

[TABLE]

Hinge loss is non-differentiable, and thus standard gradient descent optimization is not applicable. We next discuss two alternative approaches for solving this optimization.

The first approach is based on the observation that the loss function is convex, and the objective admits subgradient vectors, which generalize the standard notion of gradient. The optimization problem can be solved with subgradient-based updates. Pegasos is a well-known algorithm for this approach [40].

The alternative approach is to solve the primal formulation of the problem, which avoids the non-differentiable objective by turning it into a constraint optimization problem with slack variables. Joachims proposed a cutting-plane algorithm which solves this optimization problem efficiently [25].

For both approaches, the number of iterations of the optimization algorithm is independent of the size $|\bm{G}|$ of training dataset $\bm{G}$ [40, 25]. Since each iteration takes $O(|\bm{G}|)$ time and the number of iterations is $O(1)$ , it follows that the overall time complexity is $O(|\bm{G}|)$ .

Despite the fact that the two approaches solve the same problem, they have been hugely influential in their own right. We therefore consider both approaches, and show that by reformulating their computation as FAQ-AI we can solve them asymptotically faster than materializing the training dataset $\bm{G}$ , i.e., sublinear in $|\bm{G}|$ .

4.5.1 Background on Subgradient Descent

If the objective function $J(\bm{\beta})$ is convex but not differentiable, the gradient $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ is not defined. Such objective functions do, however, admit a subgradient, which can be used in subgradient-based optimization algorithms. Algorithm 1 naturally captures the batch subgradient-descent algorithm, if the parameters are updated in the direction of the subgradient as opposed to the gradient.

A popular application for subgradient-descent optimization algorithms is the learning of linear SVM models. One such algorithm is the Pegasos algorithm [40], which showed that subgradient methods can learn the parameters of the model significantly faster than other approaches, including Joachims’ cutting plane algorithm [25].

4.5.2 Subgradient-based optimization for linear SVM classification

We first use subgradient-based optimization to compute the parameters of the SVM model; see Section 4.5.1 for some background. The core of the optimization is the repeated computation of the objective and the partial derivatives in terms of $(\beta_{j})_{j\in[n]}$ . The objective $J(\bm{\beta})$ (with $\ell_{2}$ regularization) and the partial derivative $\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}$ are:

[TABLE]

Both $J(\bm{\beta})$ and $\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}$ can be reformulated as FAQ-AI expressions and computed without materializing $\bm{G}$ . We first rewrite the objective:

[TABLE]

In the above, the sum $\sum_{(\bm{x},y)\in\bm{G}}\bm{1}_{y=1}\cdot\bm{1}_{\bm{\beta}^{\top}\bm{x}\leq 1}$ for example can be expressed as an FAQ-AI query of the form (3) as follows:

[TABLE]

$\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}$ can also be rewritten into two FAQ-AI expressions:

[TABLE]

Theorem 4.3.

Let $\bm{I}$ be an input database where $N$ is the largest relation in $\bm{I}$ , and $Q$ be a feature extraction query. For any linear SVM classification model $\bm{\beta}^{\top}\bm{x}$ , the objective $J(\bm{\beta})$ and gradient $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ with hinge loss can be computed in time $\tilde{O}(N^{\text{\sf\#smfw}_{\ell}(Q_{\ell})})$ with #PANDA and in time $O(N^{\text{\sf faqw}_{\ell}(Q_{\ell})}\log N)$ with InsideOut, where $Q_{\ell}$ is the ligament extension of $Q$ (Def. 4.1).

Proof.

Let $n$ be the number of variables in $Q$ . We show above that $J(\bm{\beta})$ and $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ can be rewritten into $O(n)$ FAQ-AI expressions with a single ligament hyperedge (i.e. $|\mathcal{E}_{\ell}|=1$ ). The overall runtime bound for computing $J(\bm{\beta})$ and $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ with #PANDA follows from Theorem 3.19, which states that #PANDA can compute each FAQ-AI query in time $\tilde{O}(N^{\text{\sf\#smfw}_{\ell}(Q_{\ell})})$ . The runtime for computing $J(\bm{\beta})$ and $\mbox{\boldmath$ \nabla $}J(\bm{\beta})$ with InsideOut follows from Theorem 4: This is $O(N^{\text{\sf faqw}_{\ell}(Q_{\ell})}\cdot\log N)$ for a FAQ-AI query $Q$ . ∎

4.5.3 Cutting-plane algorithm for linear SVM classification in primal space

An alternative to learning linear SVM via subgradient-based optimization is to pose the problem as a constraint optimization problem. The equivalent formulation for minimizing the objective (101) is the primal formulation of linear SVM [33]:

[TABLE]

where $\xi_{\bm{x},y}$ are slack variables and $C$ is the regularization parameter.

The optimization problem solves for the hyperplane $f_{\bm{\beta}}(\bm{x})$ that classifies the data points $(\bm{x},y)\in\bm{G}$ into two classes, so that the margin between the hyperplane and the nearest data point for each class is maximized. For each $(\bm{x},y)\in\bm{G}$ , the slack variable $\xi_{\bm{x},y}$ encodes how much the point violates the margin of the hyperplane.

Joachims’ cutting-plane algorithm solves (105) in linear time over the training dataset [25]. The algorithm solves the following structural classification SVM formulation, which is equivalent to (105):

[TABLE]

This formulation has $2^{|\bm{G}|}$ constraints, one for each possible subset $\bm{T}\subseteq\bm{G}$ , and a single slack variable $\xi$ that is shared by all constraints.

Algorithm 2 presents Joachims’ cutting-plane algorithm for solving (106). It iteratively constructs a set of constraints $\mathcal{W}$ , which is a subset of all constraints in (106). In each round $t$ , it first computes the optimal value for $\bm{\beta}^{(t)}$ and $\xi^{(t)}$ over the current working set $\mathcal{W}$ . Then, it identifies the constraint $\bm{T}^{(t)}$ that is most violated for the current $\bm{\beta}^{(t)}$ , and adds this constraint to $\mathcal{W}$ . It continues until $\bm{T}^{(t)}$ is violated by at most $\epsilon$ . Joachims showed that Algorithm 2 finds the $\epsilon$ -approximate solution to (106) in $O(1)$ -many iterations [25]. Hence $|\mathcal{W}|$ and the number of constraints of the optimization problem are bounded by a number independent of $|\bm{G}|$ .

Next, we consider the inner optimization problem at line 5. Although $|\mathcal{W}|$ is small, the number $n$ of variables can still be large. This prohibits solving with quadratic programming as it can take up to $O(n^{3})$ [33]. Its Wolfe dual, on the other hand, is a quadratic program with only a constant number of variables that is independent of $n$ and one constraint. Let $\bm{x}_{\bm{T}}=\sum_{(\bm{x},y)\in\bm{T}}y\bm{x}$ . We next present the derived Wolfe dual.

Wolfe dual for optimization problem at line 5 of

Algorithm 2

We consider the inner optimization problem at line 5 of Algorithm 2, show how to derive the Wolfe dual (109) from the structural SVM classification formulation (106). Let $\bm{x}_{\bm{T}}=\sum_{(\bm{x},y)\in\bm{T}}y\bm{x}$ . The inner optimization problem at line 5 of Algorithm 2 is of the form:

[TABLE]

The Lagrangian function of this optimization problem is:

[TABLE]

where $\bm{\alpha}=(\alpha_{\bm{T}})_{\bm{T}\in\mathcal{W}}$ and $\gamma$ are Lagrange multipliers.

Since the Lagrangian is convex and continuously differentiable, we can define the Wolfe dual as the following optimization problem:

[TABLE]

The optimal condition for $\bm{\beta}$ is $\bm{\beta}=\sum_{\bm{T}\in\mathcal{W}}\alpha_{\bm{T}}\bm{x}_{\bm{T}}$ . We use this equality to rewrite the above dual formulation and attain the following optimization problem:

[TABLE]

where $\bm{\alpha}=(\alpha_{\bm{T}})_{\bm{T}\in\mathcal{W}}$ is the vector of constraints.

Theorem 4.4.

Let $\bm{I}$ be an input database where $N$ is the largest relation in $\bm{I}$ , and $Q$ be a feature extraction query. A linear SVM classification model can be learned over the training dataset $Q(I)$ with Joachims’ cutting-plane algorithm in time $\tilde{O}(N^{\text{\sf\#smfw}_{\ell}(Q_{\ell})})$ with #PANDA and in time $O(N^{\text{\sf faqw}_{\ell}(Q_{\ell})}\log N)$ with InsideOut, where $Q_{\ell}$ is the ligament extension of $Q$ (Def. 4.1).

Proof.

Recall that for each iteration $t$ of Algorithm 2, we add one set $\bm{T}^{(t)}$ to $\mathcal{W}$ , and $\bm{T}^{(t)}$ is associated with a coefficient vector $\bm{\beta}^{(t)}$ . Our main observation is that we do not have to materialize the set $\bm{T}^{(t)}$ , since it is completely determined by the data and the coefficient vector $\bm{\beta}^{(t)}$ . Thus, instead of storing $\bm{T}^{(t)}$ we can simply store $\bm{\beta}^{(t)}$ and reformulate the data dependent term $\bm{x}_{T^{(t)}}$ in (109) as a computation over $\bm{G}$ :

[TABLE]

The vector $\bm{x}_{T^{(t)}}$ has size $n$ . For each $j\in[n]$ , we can compute the $j$ ’th component of $\bm{x}_{T^{(t)}}$ as the summation of the following two FAQ-AI expressions, which are of form (3):

[TABLE]

$Q_{1}$ and $Q_{2}$ have a single ligament hyperedge (i.e. $|\mathcal{E}_{\ell}|=1$ ). Theorem 3.19 states that #PANDA computes $Q_{i}$ for $i\in[2]$ in time $\tilde{O}(N^{\text{\sf\#smfw}_{\ell}(Q_{i})})$ . Consequently, the optimization problem at line 5 of Algorithm 2 can be computed in time $\tilde{O}(N^{\text{\sf\#smfw}_{\ell}(Q_{i})})$ . This determines the runtime of Algorithm 2.

Using InsideOut, the runtime of Algorithm 2 follows from Theorem 4: This is $O(N^{\text{\sf faqw}_{\ell}(Q_{i})}\log N)$ for $Q_{i}$ . ∎

4.6 $k$ -means clustering

We next consider $k$ -means clustering, which is a popular unsupervised machine learning algorithm.

An unsupervised machine learning model is computed over a dataset $\bm{G}\subseteq{\mathbb{R}}^{n}$ , for which each tuple $\bm{x}\in\bm{G}$ is a vector of features without a label. A clustering task divides $\bm{G}$ into $k$ clusters of “similar” data points with respect to the $\ell_{2}$ norm: $\bm{G}=\cup_{i=1}^{k}\bm{G}_{i}$ , where $k$ is a given fixed positive integer. Each cluster $\bm{G}_{i}$ is represented by a cluster mean $\bm{\mu}_{i}\in\mathbb{R}^{n}$ . One of the most ubiquitous clustering methods, Lloyd’s $k$ -means clustering algorithm (also known as the $k$ -means method), involves the optimization problem (1) with respect to the partition $(\bm{G}_{i})_{i\in[k]}$ and the $k$ means $(\bm{\mu}_{i})_{i\in[k]}$ . Other norms or distance measures can be used, e.g., if we replace $\ell_{2}$ with $\ell_{1}$ -norm, then we get the $k$ -median problem. The subsequent development considers the $\ell_{2}$ -norm.

Lloyd’s algorithm can be viewed as a special instantiation of the Expectation-Maximization (EM) algorithm. It iteratively computes two updating steps until convergence. First, it updates the cluster assignments for each $(\bm{G}_{i})_{i\in[k]}$ :

[TABLE]

and then it updates the corresponding $k$ -means $(\bm{\mu}_{i})_{i\in[k]}$ :

[TABLE]

Our observation is that we can reformulate both update steps (110) and (111) as FAQ-AI expressions, without explicitly computing the partitioning $(\bm{G}_{i})_{i\in[k]}$ . For a given set of $k$ -means $(\bm{\mu}_{j})_{j\in[k]}$ , let $c_{ij}(\bm{x})$ be the following function:

[TABLE]

where $\mu_{j,l}$ is the $l$ ’th component of mean vector $\bm{\mu}_{j}$ . A data point $\bm{x}\in\bm{G}$ is closest to center $\bm{\mu}_{i}$ if and only if $c_{ij}(\bm{x})\leq 0$ holds $\forall j\in[k]$ . We use this inequality to reformulate the mean vector $\bm{\mu}_{i}$ as $O(n)$ FAQ-AI expressions. First, we express $|\bm{G}_{i}|$ as:

[TABLE]

Then, for each $l\in[n]$ , the sum $\sum_{\bm{x}\in\bm{G}_{i}}x_{l}$ can be reformulated in FAQ-AI as follows (similarly to (4)):

[TABLE]

Each component $(\mu_{i,l})_{l\in[n]}$ equals the division of $Q_{il}$ by $Q_{i}$ .

Overall, the mean vector $\bm{\mu}_{i}$ can be computed with $O(n)$ FAQ-AI expressions of the form (3).

Theorem 4.5.

Let $\bm{I}$ be an input database where $N$ is the largest relation in $\bm{I}$ , and $Q$ be a feature extraction query where $n$ is the number of its variables. Each iteration of Lloyd’s $k$ -means algorithm can be computed in time $\tilde{O}(N^{\text{\sf\#smfw}_{\ell}(Q_{\ell})})$ with #PANDA and in time $O(N^{\text{\sf faqw}_{\ell}(Q_{\ell})}\log^{k-1}N)$ with InsideOut, where $Q_{\ell}$ is the ligament extension of $Q$ (Def. 4.1).

Proof.

We have shown above that each mean vector $(\mu_{j})_{j\in[k]}$ can be computed with $O(n)$ FAQ-AI expression of the form (3), where each query has $|\mathcal{E}_{\ell}|=k$ ligament hyperedges. For #PANDA, the overall runtime to update all $k$ -means follows from Theorem 3.19 (respectively Theorem 4), which states that the algorithm can compute each FAQ-AI expression of form (3) in time $\tilde{O}(N^{\text{\sf\#smfw}_{\ell}(Q_{\ell})})$ . Using InsideOut, the runtime follows from Theorem 4: Any FAQ-AI query $Q$ of form (3) can be computed in time $O(N^{\text{\sf faqw}_{\ell}(Q_{\ell})}\log^{k-1}N)$ . ∎

5 Conclusion

We presented a theoretical and algorithmic framework for solving a special class of functional aggregate queries that arise naturally within many in-database machine learning problems and captures a variety of database queries including inequality joins. In this query class, called FAQ-AI, some of the input factors happen to be additive inequalities over some input variables. We showed that FAQ-AI queries can be solved more efficiently than general FAQ queries by relaxing the notion of tree decompositions leading to relaxed versions of commonly used width parameters.

While FAQ queries over the Boolean semiring are solvable within the tighter bound of submodular width [32, 9], such a bound is not known to be achievable over arbitrary semirings, including count queries. Therefore, we first introduced a counting analog of the submodular width, denoted #subw, by relaxing the notion of polymatroids, and showed how to meet this bound for FAQ queries over any semiring. We then turned our attention back to the special case of FAQ-AI and showed how to strengthen the bound further in this case.

We showed how to use our framework to solve several common machine learning problems over relational data asymptotically faster than both out-of-database and previously known in-database machine learning solutions. These problems include $k$ -means clustering, support vector machines, and regression over a variety of non-polynomial loss functions.

One interesting open problem is to prove a hardness result for count queries with unbounded #subw. On one hand, this would show the tightness of our positive result for solving FAQ queries over arbitrary semirings within #subw bound. On another, this would mirror the previously known dichotomy result for query classes over the Boolean semiring based on the submodular width [32].

Another remaining problem is to measure the gap between the submodular width and its counting version #subw. More precisely, is there a class of queries where the submodular width is unboundedly smaller than #subw?

Marx [32] showed a class of queries where the submodular width is bounded while the fractional hypertree width is unbounded. Proposition 3.16 showed a class of queries where the gap between #subw and the fractional hypertree width is unbounded (but #subw is also unbounded). It remains open to show whether there exists a query class where #subw is bounded and the fractional hypertree width is unbounded.

While the FAQ-AI framework can be used to optimize machine learning problems over several non-polynomial loss functions including those presented in Section 4.3 and 4.4, other classes of loss functions are not representable as FAQ-AI queries and do not benefit from this framework yet. These classes include for example the logistic and exponential losses commonly used for classification problems. It would be interesting to see if such loss functions could eventually be optimized in the same way in the in-database machine learning setting.

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 682588. LN gratefully acknowledges support from NSF grants CAREER DMS-1351362 and CNS-1409303, Adobe Research and Toyota Research, and a Margaret and Herman Sokol Faculty Award. BM’s was supported in part by a Google Research Award, and NSF grants CCF-1830711, CCF-1824303, and CCF-1733873.

Appendix A Recovering Two Existing Results

In this section we review two prior results concerned with the evaluation of queries with inequalities: the evaluation of Core XPath queries over XML documents via relational encoding in the pre/post plane and the exact inference for IQ queries with inequality joins over probabilistic databases. Our main observation is that their linearithmic complexity is due to the same structural property behind relaxed tree decompositions: Such queries admit trivially a relaxed tree decomposition, where each bag corresponds to one relation in the query and the ligament edges, i.e., the inequality joins, are covered by neighboring bags.

A.1 Core XPath Queries

We consider the problem of evaluating Core XPath queries over XML documents. An XML document is represented as a rooted tree whose nodes follow the document order. Core XPath queries define traversals of such trees using two constructs: (1) a context node that is the starting point of the traversal; and (2) a tree of location steps with one distinguished branch that selects nodes and all other branches conditioning this selection. Given a context node $v$ , a location step selects a set of the nodes in the tree that are accessible from $v$ via the step’s axis. This set of nodes provides the context nodes for the next step, which is evaluated for each such node in turn. The result of the location step is the set of nodes accessible from any of its input context nodes, sorted in document order.

The preorder rank $pre(v)$ of a node $v$ is the index of $v$ in the list of all nodes in the tree that are visited in the (depth-first, left-to-right) preorder traversal of the tree; this order is the document order. Similarly, the postorder rank $post(v)$ of $v$ is its index in the list of all nodes in the tree that are visited in the (depth-first, left-to-right) postorder tree traversal. We can use the pre/post-order ranks of nodes to define the main axes descendant, ancestor, following, and preceding [20]. Given two nodes $v$ and $v^{\prime}$ in the tree, the four axes are defined using the pre/post two-dimensional plane:

•

$v^{\prime}$ is a descendant of $v$ or equivalently $v$ is an ancestor of $v^{\prime}$

[TABLE]

•

$v^{\prime}$ follows $v$ or equivalently $v$ precedes $v^{\prime}$

[TABLE]

The remaining axes parent, child, following-sibling, and preceding-sibling are restrictions of the four main axes, where we also use the parent information $par$ for each node:

•

$v^{\prime}$ is a child of $v$ or equivalently $v$ is a parent of $v^{\prime}$

[TABLE]

•

$v^{\prime}$ is a following sibling of $v$ or equivalently $v$ is a preceding sibling of $v^{\prime}$

[TABLE]

We follow the standard approach to reformulate XPath evaluation in the relational domain [20]. We represent the document by a factor $\bm{G}$ in the Boolean semiring with schema $(pre,post,par,tag)$ . For each node in the tree there is one tuple in $\bm{G}$ with $pre$ and $post$ ranks, label $tag$ , and preorder rank $par$ of the parent node. A query with $n$ location steps is mapped to an FAQ-AI expression $Q$ that is a join of $n+1$ copies of $\bm{G}$ where the join conditions are the inequalities encoding the axes of the $n$ steps. The first copy $\bm{G}_{0}$ is for the initial context node(s). The axis of the $i$ -th step is translated into the conjunction of inequalities between pre/post rank variables of the copies $\bm{G}_{i-1}$ and $\bm{G}_{i}$ . The query $Q$ has one free variable: This is the preorder rank variable from the copy of $\bm{G}$ corresponding to the location step that selects the result nodes.

Example A.1.

The Core XPath query

[TABLE]

selects all $b$ -labeled nodes following $a$ -labeled nodes that are descendants of the given context node $v$ and that have at least one $c$ -labeled descendant node. The steps in the above textual representation of the query are separated by /. The brackets [ ] delimit a condition on the selection of the $a$ -labeled nodes. We can reformulate this query in FAQ-AI over the Boolean semiring as follows:

[TABLE]

The hypergraph of a relational encoding of a Core XPath query has one skeleton hyperedge for each copy of the document factor and one ligament edge for each pair of inequalities over two of these copies. Any two skeleton hyperedges may only have one node, i.e., query variable, in common to express the parent/child or sibling relationship between their corresponding steps. This hypergraph admits a trivial relaxed tree decomposition, which mirrors the tree structure of the query. In particular, there is one bag of the decomposition consisting of the variables of each copy of the document factor. Each ligament edge represents a pair of inequalities over variables of two neighboring bags. The running intersection property holds since the equalities are by construction only over variables from neighboring bags.

It is known that the time complexity of answering a Core XPath query $Q$ with $n$ location steps over an XML document $\bm{G}$ is $O(n\cdot|\bm{G}|)$ (Theorem 8.5 [18]; it assumes the document factor sorted). We can show a linearithmic time complexity result using our FAQ-AI reformulation of Core XPath queries and the trivial relaxed tree decomposition.

Proposition A.2.

For any Core XPath query $Q$ with $n$ location steps and XML document $\bm{G}$ , the query answer can be computed in time $O(n\cdot|\bm{G}|\cdot\log|\bm{G}|)$ .

Proof.

Let $\varphi$ be the FAQ-AI reformulation of $Q$ and $F$ the factor representing the XML document $\bm{G}$ . There is a one-to-one correspondence between the trivial relaxed tree decomposition and the XPath query, with one bag per location step. Let $n$ be the number of location steps in $Q$ , or equivalently the number of bags in the tree decomposition. We consider this trivial tree decomposition and choose its root as the bag corresponding to the location step that selects the answer node set. Our evaluation algorithm proceeds in a bottom-up left-to-right traversal of the tree decomposition and eliminates one bag at a time.

We index the bags and their corresponding factors in this traversal order. The first factor to eliminate is then denoted by $F_{1}$ while the last factor, which corresponds to the location step selecting the answer node set, is denoted by $F_{n}$ .

We initially create factors $S_{j}$ that are copies of factors $F_{j}$ corresponding to leaf bags in the tree. Consider now two factors $S_{j}$ and $F_{i}$ corresponding to a leaf bag and respectively to its parent bag. Let $\phi_{i,j}$ be the conjunction of inequalities defining the axis relationship between the location steps corresponding to these bags. We then compute a new factor $S_{i}$ that consists of those tuples in $F_{i}$ that join with some tuples in $S_{j}$ . This is expressed in FAQ-AI over the Boolean semiring:

[TABLE]

The conjunction $\phi_{i,j}$ only has two inequalities on variables between the two bags. Computing $S_{i}$ takes time $O(|F|\log|F|)$ following the algorithm from the proof of Theorem 4. We can sort both $F_{i}$ and $S_{j}$ in ascending order on the preorder column and in descending order on the postorder column. For each tuple $t$ in $F_{i}$ , the tuples in $S_{j}$ that join with $t$ form a contiguous range in $S_{j}$ . To assert whether $t$ is in $S_{i}$ , it suffices to check that this range is not empty. There are $n$ such steps and $|F|=|F_{i}|=|\bm{G}|$ , with an overall time complexity of $O(n\cdot|\bm{G}|\log|\bm{G}|)$ . ∎

A.2 Probabilistic Queries with Inequalities

The problem of query evaluation in probabilistic databases is #P-hard for general queries and probabilistic database formalisms [41]. Extensive prior work focused on charting the tractability frontier of this problem, with positive results for several classes of queries on so-called tuple-independent probabilistic databases. We discuss here one such class of queries with inequality joins called IQ [36].

A tuple-independent probabilistic database is a database where each tuple $t$ is associated with a Boolean random variable $v(t)$ that is independent of the other tuples in the database. This is the database formalism of choice for studies on query tractability since inference is hard already for trivial queries on more expressive probabilistic database formalisms [41].

FAQ factors naturally capture tuple-independent probabilistic databases: A tuple-independent probabilistic relation $R$ is a factor that maps each tuple $t$ in $R$ to the probability that the associated random variable $v(t)$ is true.

We next define the class IQ of inequality queries and later show how to recover the linearithmic time complexity for their inference.

Definition A.3 (adapted from Definitions 3.1, 3.2 [36]).

Let a hypergraph $\mathcal{H}=(\mathcal{V}=[n],\mathcal{E}_{s}\cup\mathcal{E}_{\ell})$ , where $\mathcal{E}_{s}$ and $\mathcal{E}_{\ell}$ are disjoint, $\mathcal{E}_{s}$ consists of pairwise disjoint sets, $\mathcal{E}_{\ell}$ consists of sets $\{i,j\}$ for which there is a vector $c_{i,j}\in\{[1,-1]^{\text{T}},[-1,1]^{\text{T}}\}$ , and $\forall F\in\mathcal{E}_{s}:|(\bigcup_{I\in\mathcal{E}_{\ell}}I)\cap F|\leq 1$ . An IQ query has the form

[TABLE]

where $(R_{F})_{F\in\mathcal{E}_{s}}$ are distinct factors. ∎

The edges (i.e., binary hyperedges) in $\mathcal{E}_{\ell}$ correspond to inequalities of the query variables. These inequalities are restricted so that there is at most one node (query variable) from any hyperedge in $\mathcal{E}_{s}$ . Inequalities on variables of the same factor are not in $\mathcal{E}_{\ell}$ ; they can be computed trivially in a pre-processing step.

The inequalities may only have the form $X_{i}\leq X_{j}$ or $X_{j}\leq X_{i}$ . They induce an inequality graph where $X_{i}$ is a parent of $X_{j}$ if $X_{i}\leq X_{j}$ . This graph can be minimized by removing edges corresponding to redundant inequalities implied by other inequalities [23]. Each graph node thus corresponds to precisely one factor. We categorize the IQ queries based on the structural complexity of their inequality graphs into (forests of) paths, trees, and graphs.

Example A.4.

Consider the following IQ queries:

[TABLE]

The inequalities form a path in $Q_{1}$ and a tree in $Q_{2}$ .

The probability a query over a probabilistic database $\bm{I}$ is the probability of its lineage [41]. The lineage is a propositional formula over the random variables associated with the input tuples. It is equivalent to the disjunction of all possible derivations of the query answer from the input tuples.

Example A.5.

Consider the factors $R$ , $S$ , $T$ , where $r_{i}$ , $s_{j}$ , $t_{k}$ denote the variables associated with the tuples in these factors and for a random variable $a$ , $p_{a}$ denotes the probability that $a=\text{true}$ :

[TABLE]

The lineage of $Q_{1}$ and $Q_{2}$ over these factors is:

[TABLE]

Prior work (Theorem 4.7 [36]) showed that the probability of an IQ query $Q$ with an inequality tree with $k$ nodes over a tuple-independent probabilistic database of size $N$ can be computed in time $O(2^{k}\cdot N\log N)$ using a construction of the query lineage in an Ordered Binary Decision Diagram (OBDD). We show next that a variant of the algorithm in the proof of Lemma 3.1, adapted from counting to weighted counting, i.e., probability computation, can compute the probability in time $O(N\log N)$ , thus shaving off an exponential factor in the number of inequalities.

We first explain this result using two examples, which draw on a crucial observation made in prior work [36]: The lineage of IQ queries has a chain structure: For each factor, there is an order on its random variables that defines a chain of logical implications between their cofactors in the lineage: the cofactor of the first variable implies the cofactor of the second variable, which implies the cofactor of the third variable, and so on.

Example A.6.

We continue Example A.5. The lineage of $Q_{1}$ and $Q_{2}$ is arranged so that the chain structure becomes apparent. This structure allows for an equivalent rewriting of the lineage [36], as shown next for the lineage $\phi_{r_{1}}$ of $Q_{1}$ (for a random variable $a$ , $\overline{a}$ denotes its negation):

[TABLE]

In disjunctive normal form, the lineage of $Q_{1}$ may have size cubic in the size of the database. The factorization of the lineage in Example A.5 lowers the size to quadratic. The above rewriting further reduces the size to linear. The rewritten form can be read directly from the input factors following the structure of the inequality tree.

Since the above expressions are sums of two mutually exclusive formulas, their probabilities are the sums of the probabilities of their respective two formulas. Their probabilities can be computed in one bottom-up right-to-left pass: First for $\phi_{t_{k}}$ in decreasing order of $k$ , then for $\phi_{s_{j}}$ in decreasing order of $j$ , and finally for $\phi_{r_{i}}$ in decreasing order of $i$ . We extend the probability function $p$ from input random variables to formulas over these variables. The probability of $Q_{1}$ ’s lineage, which is also the probability of $Q_{1}$ , is ( $\forall i,j,k\in[3]$ ):

[TABLE]

Since there are no variables $r_{4}$ , $s_{4}$ , and $t_{4}$ , we use $p(\phi_{r_{4}})=p(\phi_{s_{4}})=p(\phi_{t_{4}})=0$ . This computation corresponds to a decomposition of $\phi_{r_{1}}$ that can be captured by a linear-size OBDD [36].

The probability of the lineage $\psi_{r_{1}}$ of $Q_{2}$ is computed similarly ( $\forall i,j,k\in[3]$ ):

[TABLE]

This computation would correspond to a decomposition of $\psi_{r_{1}}$ that can be captured by an OBDD with several nodes for a random variable from $S$ and $T$ ; in general, such an OBDD would have a size linear in $N$ but with an additional exponential factor in the size of the inequality tree due to the inability to represent succinctly the products of lineage over $T$ and of lineage over $S$ [36]. (OBDDs with AND nodes can capture such products without this exponential factor, though in this article we do not use them.)∎

Proposition A.7.

Given a tuple-independent probabilistic database ${\bm{I}}$ of size $N$ and an IQ query $Q$ with a forest of inequality trees, we can compute the probability of $Q$ over ${\bm{I}}$ in time $O(N\log N)$ .

Proof.

We next present the inference algorithm for a given IQ query $Q$ with an inequality tree. It uses a minor variant of the algorithm from the proof of Lemma 3.1 to compute a functional aggregate query with additive inequalities over two factors.

We first reduce the input database ${\bm{I}}$ to a simplified database of unary and nullary factors that is constructed by aggregating away all query variables that do not contribute to inequalities.

Let us partition $\mathcal{E}_{s}$ into the hyperedges $\mathcal{E}_{1}$ that contain query variables involved in inequalities and all other hyperedges $\mathcal{E}_{2}$ .

We reduce each factor $(R_{F})_{F\in\mathcal{E}_{1}}$ with a query variable $X_{i}$ occurring in inequalities to a unary factor $S_{\{i\}}$ by aggregating away all other query variables. For an $X_{i}$ -value $x_{i}$ , $S_{\{i\}}(x_{i})$ gives the probability of the disjunction of the independent random variables associated with the tuples in $R_{F}$ that have the $X_{i}$ -value $x_{i}$ :

[TABLE]

We also reduce all factors $(R_{F})_{F\in\mathcal{E}_{2}}$ with no query variable occurring in inequalities to one nullary factor $S_{\emptyset}$ by aggregating away all query variables. $S_{\emptyset}()$ gives the probability of the conjunction of all factors without query variables in inequalities:

[TABLE]

This simplification reduces the set $\mathcal{E}_{s}$ of hyperedges to a new set $\mathcal{E}_{u}$ of unary edges, one per query variable in the inequalities, and one nullary edge: $\mathcal{E}_{u}=\{\emptyset\}\cup\bigcup_{\{i,j\}\in\mathcal{E}_{\ell}}\{\{i\},\{j\}\}$ . The simplification does not affect the inference problem: The probability of $Q$ is the same as the probability of the query $Q^{\prime}$ over $\mathcal{E}_{u}\cup\mathcal{E}_{\ell}$ :

[TABLE]

The hypergraph of $Q^{\prime}$ trivially admits the relaxed tree decomposition whose structure is that of the inequality tree of $Q^{\prime}$ (and of $Q$ ): The skeleton edges are $\mathcal{E}_{u}$ and the ligament edges are $\mathcal{E}_{\ell}$ .

The inference algorithm traverses the inequality tree bottom-up and eliminates one level of query variables at a time. For a variable $X_{p}$ with children $X_{c_{1}},\ldots,X_{c_{k}}$ , it computes recursively the factor

[TABLE]

We use $\text{lub}_{i}(x_{p})$ to find the value in $S_{c_{i}}$ that is the least upper bound of $x_{p}$ and $\text{lsub}_{p}(x_{p})$ to find the value in $Q_{p}$ that is the least strict upper bound of $x_{p}$ , i.e., the next value in ascending order. The definition of $Q_{p}$ is recursive: It first computes the probability for $x_{p}$ and then for its previous values. In case $X_{p}$ has no children, i.e., $k=0$ the product over $S_{c_{i}}$ is one.

The probability of $Q$ is then the product of $S_{\emptyset}$ and the probability of the first tuple in the factor of the root variable. If $Q$ has a forest of inequality trees, then the subqueries for the trees would be disconnected and thus correspond to independent random variables. The probability of $Q$ is then the product of the probabilities of the independent subqueries. ∎

The case of inequality graphs can be reduced to that of inequality trees by variable elimination. The elimination of a variable $X_{i}$ repeatedly replaces it in the query by a value from its domain. The inequality graph of this residual query has no node for $X_{i}$ and none of its edges. By removing $k$ variables to obtain an inequality tree, the complexity of computing the query probability increases by at most the product of the sizes of the factors having these $k$ variables.

Appendix B Omitted Details about Tree Decompositions

Here we prove Proposition 2.4, which is re-stated below.

Proposition B.1 (Re-statement of Proposition 2.4).

For every tree decomposition $(T,\chi)$ of a query $Q$ , there exists a non-redundant tree decomposition $(T^{\prime},\chi^{\prime})$ of $Q$ that satisfies

[TABLE]

Moreover, if $(T,\chi)$ is $F$ -connex, then $(T^{\prime},\chi^{\prime})$ can be chosen to be $F$ -connex as well.

Proof.

Given a redundant tree decomposition $(T,\chi)$ , by Definition 2.3 there must exist $t_{1}\neq t_{2}\in V(T)$ where $\chi(t_{1})\subseteq\chi(t_{2})$ . We claim that $t_{1}$ and $t_{2}$ can be chosen to be adjacent in the tree $T$ . In particular, if $t_{1}$ and $t_{2}$ from Definition 2.3 are already adjacent, we are done. Otherwise, consider the node $t_{1}^{\prime}$ that is adjacent to $t_{1}$ on the path from $t_{1}$ to $t_{2}$ in the tree $T$ . By the running intersection property, we have $\chi(t_{1})\subseteq\chi(t_{1}^{\prime})$ . Therefore if we replace $t_{2}$ with $t_{1}^{\prime}$ , we obtain two new adjacent nodes $t_{1}$ and $t_{2}$ satisfying $\chi(t_{1})\subseteq\chi(t_{2})$ .

Now we modify the tree decomposition $(T,\chi)$ by removing $t_{1}$ from $T$ and connecting all the neighbors of $t_{1}$ (other than $t_{2}$ ) directly to $t_{2}$ . It is straightforward to verify that this modification results in a valid tree decomposition $(T^{\prime},\chi^{\prime})$ . Moreover this modification maintains the $F$ -connex property of the original tree decomposition, if it was $F$ -connex in the first place. If the new tree decomposition $(T^{\prime},\chi^{\prime})$ is non-redundant, we are done. Otherwise, we inductively repeat the above process by finding a new adjacent pair $t_{1}\neq t_{2}$ satisfying $\chi(t_{1})\subseteq\chi(t_{2})$ . (This induction is over the number of bags in the tree decomposition since each time we are dropping one bag.) ∎

Appendix C The InsideOut Algorithm

In this section, we aim to provide a proof sketch for Theorem 2.5. We refer the reader to [6] and its extended version [5] for more details. The proof also sheds light on many omitted technical details in the proofs of Theorems 4 and 3.12 including how to generalize theorems from the case of no free variables $F=\emptyset$ to the case of an arbitrary set of free variables.

Theorem C.1 (Re-statement of Theorem 2.5).

InsideOut* answers query (2) in time $O(N^{\text{\sf faqw}(Q)}\cdot\log N+|Q|)$ .*

Proof.

Let $Q$ be an FAQ-query of the form (2) with hypergraph $\mathcal{H}=(\mathcal{V}=[n],\mathcal{E})$ and free variables $F\subseteq\mathcal{V}$ . Let $w:=\text{\sf faqw}(Q)$ . By definition of $\text{\sf faqw}(Q)$ from (11), there must exist an $F$ -connex tree decomposition $(T,\chi)\in\text{\sf TD}_{F}$ where all bags $t\in V(T)$ satisfy

[TABLE]

Moreover by Proposition 2.4, the above tree decomposition $(T,\chi)$ can be assumed to be non-redundant. By Definition 2.2, there must exist a (possibly empty) subset $V^{\prime}\subseteq V(T)$ that forms a connected subtree of $T$ and satisfies $\bigcup_{t\in V^{\prime}}\chi(t)=F$ . Fix a root $r$ of the tree decomposition $(T,\chi)$ to be:

•

either an arbitrary node from $V^{\prime}$ if $V^{\prime}$ is not empty,

•

or an arbitrary node from $V(T)$ if $V^{\prime}$ is empty.

Based on the above choice of the root $r$ , the following holds:

Claim 4.

If $V^{\prime}\neq V(T)$ , then there must exist a leaf node $t_{1}\in V(T)\setminus V^{\prime}$ .

If $V^{\prime}$ is empty, then the above claim holds trivially. Otherwise, the above claim holds because the root $r$ belongs to the connected subtree $V^{\prime}\neq V(T)$ .

We recognize two cases:

Case 1: $F\neq\mathcal{V}$ . In this case, $V^{\prime}\neq V(T)$ (since $F=\bigcup_{t\in V^{\prime}}\chi(t)$ and $\mathcal{V}=\bigcup_{t\in V(T)}\chi(t)$ ). By Claim 4, let $t_{1}$ be a leaf node from $V(T)\setminus V^{\prime}$ , and let $t_{2}$ be the parent of $t_{1}$ . Let $L=\chi(t_{1})$ , $U=\chi(t_{2})$ , and $M=L\setminus U$ . Because the tree decomposition $(T,\chi)$ is non-redundant (thanks to Proposition 2.4), we have $M\neq\emptyset$ .

Claim 5.

For any $K\in\mathcal{E}$ with $K\cap M\neq\emptyset$ , we must have $K\subseteq L$ .

The above claim holds by the definition of a tree decomposition from Section 2.1: Otherwise, the running intersection property would break.

To rewrite query (2), we need to utilize the notion of indicator projection from Definition 3.6 along with its property given by (25). Query (2) can be written as:

[TABLE]

The last equality above holds because of the distributive property of semirings. We define the product inside the inner sum $\bigoplus_{\bm{x}_{M}}$ to be a query $\Phi_{t_{1}}(\bm{x}_{L})$ , which is associated with the bag $t_{1}$ . Note that by Claim 5, all factors $R_{K}(\bm{x}_{K})$ and $\pi_{K,L}(\bm{x}_{K\cap L})$ in this product involve only variables from $\bm{x}_{L}$ .

Query $\Phi_{t_{1}}$ can be computed with the help of worst-case optimal join algorithms [34, 35, 43]. In particular, for every $K\in\mathcal{E}_{\not\infty}$ where $K\cap L\neq\emptyset$ , define $\overline{\pi}_{K,L}$ to be the support of the factor $\pi_{K,L}$ , i.e.

[TABLE]

$\overline{\pi}_{K,L}$ can be viewed as a relation over variables $\bm{x}_{K\cap L}$ . Solving the FAQ-query $\Phi_{t_{1}}$ can be reduced to solving the join query $\overline{\Phi}_{t_{1}}$ defined as follows:

[TABLE]

This is because once we solve the join query $\overline{\Phi}_{t_{1}}$ , the FAQ-query $\Phi_{t_{1}}$ can be computed as follows:

[TABLE]

where $\overline{\Phi}_{t_{1}}$ above denotes the output of the join query $\overline{\Phi}_{t_{1}}$ . The join query $\overline{\Phi}_{t_{1}}$ can be computed using a worst-case optimal join algorithm in time $O(N^{\rho^{*}_{\mathcal{E}_{\not\infty}}(L)}\cdot\log N))$ , which is $O(N^{w}\cdot\log N)$ by (117).

Once we have computed $\Phi_{t_{1}}$ , we use it to compute $\Psi_{t_{1}}$ defined as follows:

[TABLE]

The above can be computed by sorting tuples $\bm{x}_{L}$ that satisfy $\Phi_{t_{1}}(\bm{x}_{L})\neq\bm{0}$ lexicographically based on $(\bm{x}_{L\cap U},\bm{x}_{M})$ so that tuples $\bm{x}_{L}$ sharing the same $\bm{x}_{L\cap U}$ -prefix become consecutive. Then for each distinct $\bm{x}_{L\cap U}$ -prefix, we aggregate away $\Phi_{t_{1}}(\bm{x}_{L})$ over all tuples $\bm{x}_{L}$ sharing that prefix.

Finally, expression (121) can be rewritten as:

[TABLE]

The above is an FAQ-query of the same form as (2). It admits an $F$ -connex tree decomposition that results from the original $F$ -connex tree decomposition $(T,\chi)$ by removing the leaf bag $t_{1}$ . In particular, the newly added hyperedge $L\cap U$ (corresponding to $\Psi_{t_{1}}(\bm{x}_{L\cap U})$ ) is contained in $\chi(t_{2})$ , and all other properties of $F$ -connex tree decompositions continue to hold after the removal of $t_{1}$ . Moreover thanks to the fact that $M\neq\emptyset$ , the new query (124) has strictly less variables than the original query (2). In particular, the new query only involves the variables $\bm{x}_{\mathcal{V}\setminus M}$ while the original query involves $\bm{x}_{\mathcal{V}}$ . (We say that variables $\bm{x}_{M}$ have been eliminated from the original query hence the term “variable elimination”.) By induction on the number of variables, we can solve the original query (2) in the claimed time of $O(N^{w}\cdot\log N+|Q|)$ . (In the base case, we have an FAQ-query with no variables, where the theorem holds trivially.)

Case 2: $F=\mathcal{V}$ . Let $t_{1}$ be an arbitrary leaf node and $t_{2}$ its parent. Let $L,U$ and $M$ be defined as before. Claim 5 continues to hold. In this case, query (2) can be written as:

[TABLE]

Just like in the previous case, we use a worst-case optimal join algorithm to compute $\Phi_{t_{1}}$ above in time $O(N^{\rho^{*}_{\mathcal{E}_{\not\infty}}(L)}\cdot\log N))=O(N^{w}\cdot\log N)$ . Once we do, we compute its indicator projection:

[TABLE]

Now (127) can be written as:

[TABLE]

Note that thanks to the indicator projection $\Psi_{t_{1}}$ that is included in the new query $Q^{\prime}$ above, the following holds: For every tuple $\bm{x}_{\mathcal{V}\setminus M}$ that satisfies $Q^{\prime}(\bm{x}_{\mathcal{V}\setminus M})\neq\bm{0}$ , there must exist at least one tuple $\bm{x}_{M}$ that satisfies $Q((\bm{x}_{\mathcal{V}\setminus M},\bm{x}_{M}))\neq\bm{0}$ . This in turn implies that:

[TABLE]

By induction on the number of variables, we solve the new query $Q^{\prime}$ (which doesn’t have a bag $t_{1}$ nor variables $\bm{x}_{M}$ ) in time $O(N^{w}\cdot\log N+|Q^{\prime}|)$ , which is $O(N^{w}\cdot\log N+|Q|)$ thanks to (131). Finally, we compute the original query $Q$ using the expression

[TABLE]

In particular, the above expression can be computed in $O(|Q|)$ time as follows. First, we index tuples $\bm{x}_{L}=(\bm{x}_{L\cap U},\bm{x}_{M})$ satisfying $\Phi_{t_{1}}(\bm{x}_{L})\neq\bm{0}$ so that for a given $\bm{x}_{L\cap U}$ we can enumerate in constant delay all tuples $\bm{x}_{M}$ where $\Phi_{t_{1}}((\bm{x}_{L\cap U},\bm{x}_{M}))\neq\bm{0}$ . After that, we iterate over tuples $\bm{x}_{\mathcal{V}\setminus M}$ satisfying $Q^{\prime}(\bm{x}_{\mathcal{V}\setminus M})\neq\bm{0}$ , extract the $\bm{x}_{L\cap U}$ -part out of each such $\bm{x}_{\mathcal{V}\setminus M}$ -tuple, and then use the previous index of $\Phi_{t_{1}}$ to enumerate $\bm{x}_{M}$ -tuples corresponding to $\bm{x}_{L\cap U}$ . ∎

Appendix D The PANDA Algorithm

In this section, we give an overview of the PANDA algorithm developed in [9] along with its extended version [8]. The aim is to fill out omitted technical details in the proof of Theorem 3.15, which introduces a variant of PANDA called #PANDA.

Following notation from Section 1.2, the input to the PANDA algorithm is as follows:

•

A multi-hypergraph 111111See definition in Section 1.2. $\mathcal{H}=(\mathcal{V}=[n],\mathcal{E})$ .

•

A relation $R_{K}$ associated with each hyperedge $K\in\mathcal{E}$ . The arity of $R_{K}$ is $|K|$ .

•

A disjunctive Datalog query of the form

[TABLE]

where $\mathcal{B}\subseteq 2^{\mathcal{V}}$ .

The output of PANDA is a collection of tables $(G_{B})_{B\in\mathcal{B}}$ that form a solution to the disjunctive Datalog query (133) (which can have many solutions). In particular, the tables $(G_{B})_{B\in\mathcal{B}}$ must satisfy the following condition:

Each tuple $\bm{x}_{\mathcal{V}}$ that satisfies the conjunction $\bigwedge_{K\in\mathcal{E}}R_{K}(\bm{x}_{K})$ must also satisfy the disjunction $\bigvee_{B\in\mathcal{B}}G_{B}(\bm{x}_{B})$ .

Following notation from Sections 2.1 and 2.2, the runtime of PANDA is $\tilde{O}(N^{e})$ , where

[TABLE]

(Recall that $\tilde{O}$ hides a polylogarithmic factor in $N$ .) We start with some preliminaries. The following lemma shows how to convert the expression (134) into a linear program.

Lemma D.1 ([9, 8]).

There exists a non-negative vector $\bm{\lambda}:=(\lambda_{B})_{B\in\mathcal{B}}$ satisfying $\left\|\bm{\lambda}\right\|_{1}=1$ and

[TABLE]

Note that the right-hand side of (135) is a linear program: Its variables are $\left(h(S)\right)_{S\subseteq\mathcal{V}}$ , its objective function is $\sum_{B\in\mathcal{B}}\lambda_{B}\cdot h(B)$ , and its constraints are $h\in\Gamma_{n}$ and $h\in\text{\sf ED}_{\not\infty}$ , which are all linear. (Recall the definitions of $\Gamma_{n}$ and $\text{\sf ED}_{\not\infty}$ from Section 2.1 and (9) respectively.) Our next step is to reduce solving this linear program into finding a Shannon inequality, defined below.

Definition D.2 (Shannon inequality).

Given real constants $(\alpha_{S})_{S\subseteq\mathcal{V}}$ (where each $\alpha_{S}$ could be either positive, negative, or zero), the linear inequality $\sum_{S\subseteq\mathcal{V}}\alpha_{S}\cdot h(S)\geq 0$ is called a Shannon inequality if it holds for all $h\in\Gamma_{n}$ .

Let OPT be the optimal solution to the linear program from the right-hand side of (135):

[TABLE]

By linear programming duality, the following lemma was proved in [8].

Lemma D.3 ([9, 8]).

There exists a non-negative vector $\bm{C}=(C_{K})_{K\in\text{\sf ED}_{\not\infty}}$ satisfying the following conditions:

•

The inequality

[TABLE]

is a Shannon inequality.

•

[TABLE]

Shannon-flow inequalities [8] is a special class of Shannon inequalities that subsumes inequality (137). It enjoys certain properties that the PANDA algorithm relies on. Given $X\subset Y\subseteq\mathcal{V}$ , let $h(Y|X)$ denote

[TABLE]

Definition D.4 (Shannon-flow inequality [8]).

Given $\mathcal{B}\subseteq 2^{\mathcal{V}}$ , let $\bm{\lambda}=(\lambda_{B})_{B\in\mathcal{B}}$ be a non-negative vector. Let $\bm{\delta}=(\delta_{Y|X})_{X\subset Y\subseteq\mathcal{V}}$ be another non-negative vector. A Shannon-flow inequality is a Shannon inequality that has the following form:

[TABLE]

Note that (137) is a special case of (140) where $\delta_{K|\emptyset}=C_{K}$ for $K\in\text{\sf ED}_{\not\infty}$ and $\delta_{Y|X}=0$ otherwise.

Lemma D.5 (Proof sequence construction [9, 8]).

Every Shannon-flow inequality (140) admits a proof of the following form. Start from the right-hand side of (140), apply a sequence of proof steps each of which replaces a term (or more) with a smaller term (or more), until we end up with the left-hand side of (140) (which proves that the left-hand side is smaller than the right-hand side). Each proof step in the sequence has one of the following forms:

[TABLE]

Each proof step in (141)-(144) is interpreted as replacing the term(s) on the left-hand side of the step with the terms(s) on the right-hand side. Note that for each step in (141)-(144) and each $h\in\Gamma_{n}$ , the right-hand side of the step is guaranteed to be smaller than the left-hand side. For example, consider the submodularity step (143), where we replace $h(X|X\cap Y)$ with $h(X\cup Y|Y)$ . Because $h\in\Gamma_{n}$ , it must satisfy the inequality $h(X\cup Y)+h(X\cap Y)\leq h(X)+h(Y)$ (Recall the definition of $\Gamma_{n}$ in Section 2.1). But this inequality can be rearranged into $h(X\cup Y|Y)\leq h(X|X\cap Y)$ . Similarly consider the monotonicity step (144), where we replace $h(Y)$ with $h(X)$ for some $X\subset Y$ . Since $h\in\Gamma_{n}$ , it must satisfy $h(X)\leq h(Y)$ whenever $X\subset Y$ .

The PANDA algorithm starts from the target runtime bound $N^{e}$ where $e$ is given by (134), computes a corresponding Shannon-flow inequality (137) from Lemma D.3 (where $\left\|\bm{\lambda}\right\|_{1}=1$ thanks to Lemma 135), and then uses Lemma D.5 to construct a proof sequence $s$ for this inequality consisting of $\ell$ proof steps $s=(s_{1},\ldots,s_{\ell})$ . After that the algorithm mimics the process of using this proof sequence to prove inequality (137). In particular, it starts from the right-hand side of (137) associating each entropy term $h(K)$ with a corresponding input relation $R_{K}$ . After that it starts applying the proof steps one by one: Each time a proof step $s_{i}$ is applied to replace some entropy terms on the right-hand side of (137) with new entropy terms, the algorithm takes the relations associated with the old terms, applies some relational operator on them to produce new relations, and associates the new relations with the new entropy terms. At the end of the proof sequence, we would have completely transformed the right-hand side of (137) into the left-hand side completing the proof. At that time, PANDA would have computed relations $G_{B}$ associated with entropy terms $h(B)$ on the left-hand side of (137). Those particular $G_{B}$ relations form a solution to the input disjunctive Datalog rule (133). Moreover the algorithm ensures that every relational operator that was performed while mimicking the proof sequence took time within our target runtime bound of $N^{e}$ .

Before formally describing the invariants maintained by the algorithm, we need some notation.

Definition D.6 (Degrees in a relation).

Given a relation $R_{Y}$ and a set $X\subset Y$ , the degree of $Y$ w.r.t. a tuple $\bm{t}_{X}\in\pi_{X}R_{Y}$ and w.r.t. to $X$ are defined as follows:

[TABLE]

As a special case, we have $\deg_{R_{Y}}(Y|\emptyset)=|R_{Y}|$ .

Although the PANDA algorithm starts from a Shannon-flow inequality of the special from (137), after applying a decomposition proof step $h(K)\rightarrow h(X)+h(K|X)$ (for some $X\subset K$ ) replacing some term $h(K)$ on the right-hand side of (137) with new terms $h(X)+h(K|X)$ , the resulting inequality no longer falls under the special form (137). Instead it falls back to the more general form of a Shannon-flow inequality (140). Therefore, in general the PANDA algorithm maintains a Shannon-flow inequality (140).

The PANDA algorithm maintains the following invariants:

(I1)

Every term $h(Y|X)$ on the right-hand side of (140) is associated with a relation $R_{Z}$ satisfying $Y\setminus X\subseteq Z\subseteq Y$ . The relation $R_{Z}$ is called the guard of the term $h(Y|X)$ . (Note that if $X=\emptyset$ , then $Z=Y$ .)

(I2)

The guards satisfy the following:

[TABLE]

where $N$ is the input size and $e$ is given by (134). For convenience, we define $n_{Y|X}:=\log_{2}\left[\deg_{R_{Z}}(Z|X\cap Z)\right]$ where $R_{Z}$ is the guard of $h(Y|X)$ .

(I3)

Every guard $R_{Z}$ satisfies

[TABLE]

Initially, the above invariants are satisfied. In particular, inequality (140) at the beginning is just (137) where each $h(K)$ is guarded by $R_{K}$ thus satisfying invariant (I1). Moreover, (147) is satisfied as follows:

[TABLE]

Also (148) is satisfied because initially each input relation $R_{K}$ satisfies $|R_{K}|\leq N$ . (It is straightforward to verify that $e$ defined by (134) is at least $1$ .)

Next we describe how PANDA handles each type of proof steps (141)-(144) while maintaining the above invariants and also ensuring that all operations are performed in time $N^{e}$ .

Case 1: Submodularity step $h(X|X\cap Y)\rightarrow h(X\cup Y|Y)$ for some $X,Y\subseteq\mathcal{V}$ . Let $R_{Z}$ be the guard of the term $h(X|X\cap Y)$ . We can directly use $R_{Z}$ as a guard of the new term $h(X\cup Y|Y)$ thus satisfying invariant (I1). Since both terms share the same guard, we have $n_{X\cup Y|Y}=n_{X|X\cap Y}$ hence the left-hand side of (147) remains unchanged and invariant (147) remains satisfied. Invariant (148) remains satisfied as well.

Case 2: Monotonicity step $h(Y)\rightarrow h(X)$ for $X\subset Y$ . Let $R_{Y}$ be the guard of $h(Y)$ . We use $R_{X}:=\pi_{X}R_{Y}$ as a guard of the new term $h(X)$ . We have $n_{X|\emptyset}=\log_{2}|R_{X}|\leq\log_{2}|R_{Y}|=n_{Y|\emptyset}$ hence the left-hand side of (147) does not increase and invariant (147) remains satisfied. Invariant (148) remains satisfied because $|R_{X}|\leq|R_{Y}|\leq N^{e}$ . Moreover since $|R_{Y}|\leq N^{e}$ by invariant (148), the projection $\pi_{X}R_{Y}$ can be computed in our target runtime bound of $N^{e}$ .

Case 3: Decomposition step $h(Y)\rightarrow h(X)+h(Y|X)$ for $X\subset Y$ . Let $R_{Y}$ be the guard of $h(Y)$ . In this case, we partition $R_{Y}$ into a small number $k=O(\log|R_{Y}|)$ of relations $R_{Y}^{(1)},\ldots,R_{Y}^{(k)}$ , branch the execution of the algorithm into $k$ different branches where $R_{Y}$ is replaced with $R_{Y}^{(j)}$ on the $j$ -th branch for $j\in[k]$ , and continue the algorithm on each branch separately, and combine the outputs at the very end. Because of the logarithmic number of branches created at each decomposition step, the runtime of the algorithm blows up from the ideal bound of $N^{e}$ to $\tilde{O}(N^{e})$ where $\tilde{O}$ hides a polylogarithmic factor in $N$ .

In particular, we partition tuples $\bm{t}_{X}\in\pi_{X}R_{Y}$ into $k$ buckets based on $\deg_{R_{Y}}(Y|\bm{t}_{X})$ and partition $R_{Y}$ accordingly. Specifically, for each $j\in[k]$ , we define

[TABLE]

After partitioning, PANDA creates $k$ independent branches of the problem, where in the $j$ -th branch, $R_{Y}$ is replaced by both $R_{X}^{(j)}$ and $R_{Y}^{(j)}$ . Note that for each $j\in[k]$ , the size of $R_{X}^{(j)}$ is at most $|R_{Y}|/2^{j-1}$ therefore:

[TABLE]

Moreover if we partition each $R_{Y}^{(j)}$ further into two parts, we can get rid of the division by $2$ in (154) at the cost of doubling the number of branches.

Now on the $j$ -th branch, we replace the term $h(Y)$ with the two terms $h(X)$ and $h(Y|X)$ , which are guarded by $R_{X}^{(j)}$ and $R_{Y}^{(j)}$ respectively. By taking the $\log$ of both sides of (154) (and ignoring the division by 2), we have $n_{Y|\emptyset}\geq n_{X|\emptyset}+n_{Y|X}$ hence the left-hand side of (147) does not increase and invariant (147) remains satisfied. Moreover, because $\max(|R_{X}^{(j)}|,|R_{Y}^{(j)}|)\leq|R_{Y}|\leq N^{e}$ , invariant (148) remains satisfied as well. Finally because $|R_{Y}|\leq N^{e}$ thanks to invariant (148), the above partitioning of $R_{Y}$ can be performed in time $N^{e}$ as needed.

Case 4: Composition step $h(X)+h(Y|X)\rightarrow h(Y)$ for $X\subset Y$ . Let $R_{X}$ be the guard of $h(X)$ and $R_{Z}$ be the guard of $h(Y|X)$ . Recall from (I1) that this implies $Y\setminus X\subseteq Z\subseteq Y$ . In this case, we compute the join $R_{Y}:=R_{X}\Join R_{Z}$ by going over tuples in $R_{X}$ , projecting each one of them onto $Z$ , and finding the matching tuple in $R_{Z}$ (which can be done efficiently by a proper indexing of $R_{Z}$ ). The output size of this join satisfies

[TABLE]

Moreover the join can be computed in time proportional to $|R_{Y}|$ . We use the join result $R_{Y}$ as a guard for the new term $h(Y)$ that results from applying the proof step. From (155), we have $n_{Y|\emptyset}\leq n_{X|\emptyset}+n_{Y|X}$ hence the left-hand side of (147) does not increase and invariant (147) remains satisfied.

It remains to verify that the above step maintains invariant (148) and can also be performed in the desired time of $N^{e}$ . As it turns out, neither is true: Both the size of the new relation $R_{Y}$ and the time it takes to compute it can exceed $N^{e}$ . In order to enforce these invariants, some new technical ideas are needed that are beyond the scope of this short introduction to PANDA. We refer the reader to [8] for a detailed explanation of how to handle this last case properly without violating the invariants of the algorithm.

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Abo Khamis, M., Curtin, R. R., Moseley, B., Ngo, H. Q., Nguyen, X., Olteanu, D., and Schleich, M. On functional aggregate queries with additive inequalities. In PODS (2019), pp. 414–431.
2[2] Abo Khamis, M., Ngo, H. Q., Nguyen, X., Olteanu, D., and Schleich, M. In-database learning with sparse tensors. In PODS (2018), pp. 325–340.
3[3] Abo Khamis, M., Ngo, H. Q., Nguyen, X., Olteanu, D., and Schleich, M. Learning models over relational data using sparse tensors and functional dependencies. ACM Trans. Database Syst. (2020).
4[4] Abo Khamis, M., Ngo, H. Q., Olteanu, D., and Suciu, D. Boolean tensor decomposition for conjunctive queries with negation. In ICDT (2019), pp. 21:1–21:19.
5[5] Abo Khamis, M., Ngo, H. Q., and Rudra, A. FAQ: questions asked frequently. Co RR abs/1504.04044 (2015).
6[6] Abo Khamis, M., Ngo, H. Q., and Rudra, A. FAQ: questions asked frequently. In PODS (2016), pp. 13–28.
7[7] Abo Khamis, M., Ngo, H. Q., and Rudra, A. Juggling functions inside a database. SIGMOD Rec. 46 , 1 (2017), 6–13.
8[8] Abo Khamis, M., Ngo, H. Q., and Suciu, D. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? Co RR abs/1612.02503 (2016).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Functional Aggregate Queries with Additive Inequalities

Abstract

1 Introduction

1.1 Motivating examples

Example 1.1**.**

Example 1.2**.**

1.2 The FAQ-AI problem

Definition 1.3** (FAQ-AI).**

Example 1.4**.**

1.3 Our contributions

Answering FAQ-AI over Boolean semiring

Answering FAQ over an arbitrary semiring

Answering FAQ-AI over an arbitrary semiring

Applications to relational Machine Learning

1.4 Related work

2 Preliminaries

2.1 Tree decompositions and polymatroids

Remark 2.1**.**

Definition 2.2** **(FFF-connex tree

Definition 2.3** (Non-redundant tree decomposition).**

Proposition 2.4**.**

2.2 InsideOut and PANDA

InsideOut [6, 5, 7]

Theorem 2.5** ([6, 5]).**

PANDA [9, 8]

Theorem 2.6** ([9, 8]).**

2.3 Semigroup range searching

3 Relaxed tree decompositions and relaxed polymatroids

3.1 Connection to semigroup range searching

Lemma 3.1**.**

Proof.

Example 3.2**.**

3.2 Relaxed tree decompositions

Definition 3.3** (Relaxed tree decomposition).**

3.2.1 FAQ-AI on a general semiring

Definition 3.4** (Relaxed faqw).**

Theorem 3.5**.**

Proof.

Definition 3.6** (Indicator Projection [6, 5, 7]).**

Claim 1**.**

Example 3.7**.**

Proposition 3.8**.**

Proof.

Proposition 3.9**.**

Proof.

3.2.2 FAQ-AI on the Boolean semiring

Example 3.10**.**

Definition 3.11**.**

Theorem 3.12**.**

Proof.

3.3 Relaxed polymatroids

3.3.1 FAQ over an arbitrary semiring

Definition 3.13** (E\mathcal{E}E-polymatroids and Γn∣E\Gamma_{n|\mathcal{E}}Γn∣E​).**

Definition 3.14** (#-submodular FAQ-width).**

Theorem 3.15**.**

Proof.

Claim 2**.**

Proposition 3.16** (Connecting #smfw to smfw and faqw).**

Proof.

Claim 3** (Variant of the Modularization Lemma [8]).**

Proof of Claim 92.

Example 3.17**.**

3.3.2 FAQ-AI over an arbitrary semiring

Definition 3.18**.**

Theorem 3.19**.**

Example 3.20**.**

4 Applications to relational Machine Learning

4.1 Training ML models over databases

4.2 Background: Gradient-based Optimization

4.3 Robust linear regression with Huber loss

4.3.1 Reformulating the objective J(β)J(\bm{\beta})J(β) with Huber loss into FAQ-AI expressions

4.3.2 Reformulating the gradient ∂J(β)∂βj\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}∂βj​∂J(β)​ with Huber loss into FAQ-AI expressions

Definition 4.1** (QℓQ_{\ell}Qℓ​: The ligament extension of QQQ).**

Theorem 4.2**.**

Example 1.1.

Example 1.2.

Definition 1.3 (FAQ-AI).

Example 1.4.

Remark 2.1.

Definition 2.2 ( $F$ -connex tree

Definition 2.3 (Non-redundant tree decomposition).

Proposition 2.4.

Theorem 2.5 ([6, 5]).

Theorem 2.6 ([9, 8]).

Lemma 3.1.

Example 3.2.

Definition 3.3 (Relaxed tree decomposition).

Definition 3.4 (Relaxed faqw).

Theorem 3.5.

Definition 3.6 (Indicator Projection [6, 5, 7]).

Claim 1.

Example 3.7.

Proposition 3.8.

Proposition 3.9.

Example 3.10.

Definition 3.11.

Theorem 3.12.

Definition 3.13 ( $\mathcal{E}$ -polymatroids and $\Gamma_{n|\mathcal{E}}$ ).

Definition 3.14 (#-submodular FAQ-width).

Theorem 3.15.

Claim 2.

Proposition 3.16 (Connecting #smfw to smfw and faqw).

Claim 3 (Variant of the Modularization Lemma [8]).

Example 3.17.

Definition 3.18.

Theorem 3.19.

Example 3.20.

4.3.1 Reformulating the objective $J(\bm{\beta})$ with Huber loss into FAQ-AI expressions

4.3.2 Reformulating the gradient $\frac{\partial J(\bm{\beta})}{\partial\beta_{j}}$ with Huber loss into FAQ-AI expressions

Definition 4.1 ( $Q_{\ell}$ : The ligament extension of $Q$ ).

Theorem 4.2.

Theorem 4.3.

Theorem 4.4.

4.6 $k$ -means clustering

Theorem 4.5.

Example A.1.

Proposition A.2.

Definition A.3 (adapted from Definitions 3.1, 3.2 [36]).

Example A.4.

Example A.5.

Example A.6.

Proposition A.7.

Proposition B.1 (Re-statement of Proposition 2.4).

Theorem C.1 (Re-statement of Theorem 2.5).

Claim 4.

Claim 5.

Lemma D.1 ([9, 8]).

Definition D.2 (Shannon inequality).

Lemma D.3 ([9, 8]).

Definition D.4 (Shannon-flow inequality [8]).

Lemma D.5 (Proof sequence construction [9, 8]).

Definition D.6 (Degrees in a relation).