A note on self-improving sorting with hidden partitions

Siu-Wing Cheng; Man-Kwun Chiu; Kai Jin

arXiv:1902.00219·cs.CG·February 4, 2019

A note on self-improving sorting with hidden partitions

Siu-Wing Cheng, Man-Kwun Chiu, Kai Jin

PDF

Open Access

TL;DR

This paper introduces an optimal self-improving sorting algorithm that adapts to hidden partitions in data, achieving expected time complexity based on the entropy of the sorted output, thus improving efficiency for certain data distributions.

Contribution

It presents a novel algorithm for self-improving sorting with hidden partitions, achieving optimal expected time proportional to the entropy of the output ranks.

Findings

01

Algorithm runs in expected time O(H((I)) + n)

02

Achieves optimality based on entropy of output ranks

03

Effective for data with hidden partition structures

Abstract

We study self-improving sorting with hidden partitions. Our result is an optimal algorithm which runs in expected time O(H(\pi(I)) + n), where I is the given input which contains n elements to be sorted, \pi(I) is the output which are the ranks of all element in I, and H(\pi(I)) denotes the entropy of the output.

Figures5

Click any figure to enlarge with its caption.

Figure 3

Equations14

t^{\mathbf{q}}_{i}=\left\{\begin{array}[]{ll}O(n_{k}+\log(1/q_{i})),&q_{i}>0;\\ O(n_{k}\cdot\log n),&q_{i}=0.\end{array}\right.

t^{\mathbf{q}}_{i}=\left\{\begin{array}[]{ll}O(n_{k}+\log(1/q_{i})),&q_{i}>0;\\ O(n_{k}\cdot\log n),&q_{i}=0.\end{array}\right.

q \sum Pr (q) \cdot i \sum p_{i} t_{i}^{q} = i \sum p_{i} q \sum Pr (q) t_{i}^{q}

q \sum Pr (q) \cdot i \sum p_{i} t_{i}^{q} = i \sum p_{i} q \sum Pr (q) t_{i}^{q}

\displaystyle=\sum_{i}p_{i}\sum_{\mathbf{q}:q_{i}>0}\Pr(\mathbf{q})O\big{(}n_{k}+\log(1/q_{i})\big{)}+\sum_{i}p_{i}\sum_{\mathbf{q}:q_{i}=0}\Pr(\mathbf{q})O\big{(}n_{k}\log n\big{)}

\hbox{The second term}=O\big{(}n_{k}\log n\sum_{i}p_{i}(1-p_{i})^{T}\big{)}\leq O\big{(}n_{k}\log nW^{*}/(T+1)\big{)}=O(n_{k}).

\hbox{The second term}=O\big{(}n_{k}\log n\sum_{i}p_{i}(1-p_{i})^{T}\big{)}\leq O\big{(}n_{k}\log nW^{*}/(T+1)\big{)}=O(n_{k}).

\displaystyle=\sum_{i}p_{i}\sum_{\mathbf{q}:q_{i}>0}\Pr(\mathbf{q})O(n_{k})+\sum_{i}p_{i}\sum_{\mathbf{q}:q_{i}>0}\Pr(\mathbf{q})O\big{(}\log(1/q_{i})\big{)}

\displaystyle=\sum_{i}p_{i}\sum_{\mathbf{q}:q_{i}>0}\Pr(\mathbf{q})O(n_{k})+\sum_{i}p_{i}\sum_{\mathbf{q}:q_{i}>0}\Pr(\mathbf{q})O\big{(}\log(1/q_{i})\big{)}

\displaystyle\leq O(n_{k})+\sum_{i}p_{i}\sum_{j=1}^{T}\Pr(q_{i}=\frac{j}{T})O\big{(}\log(\frac{T}{j})\big{)}

\begin{gathered}\sum_{i}p_{i}\sum_{j=1}^{T}\Pr(q_{i}=\frac{j}{T})O\big{(}\log(\frac{T}{j})\big{)}\\ =\sum_{i}p_{i}\sum_{1\leq j\leq p_{i}T/2}\Pr(q_{i}=\frac{j}{T})O\big{(}\log(\frac{T}{j})\big{)}+\sum_{i}p_{i}\sum_{p_{i}T/2<j\leq T}\Pr(q_{i}=\frac{j}{T})O\big{(}\log(\frac{T}{j})\big{)}\\ \leq\sum_{i}p_{i}\sum_{1\leq j\leq p_{i}T/2}\Pr(q_{i}=\frac{j}{T})O\big{(}\log T\big{)}+\sum_{i}p_{i}\sum_{p_{i}T/2<j\leq T}\Pr(q_{i}=\frac{j}{T})O\big{(}\log(\frac{2}{p_{i}})\big{)}\end{gathered}

\begin{gathered}\sum_{i}p_{i}\sum_{j=1}^{T}\Pr(q_{i}=\frac{j}{T})O\big{(}\log(\frac{T}{j})\big{)}\\ =\sum_{i}p_{i}\sum_{1\leq j\leq p_{i}T/2}\Pr(q_{i}=\frac{j}{T})O\big{(}\log(\frac{T}{j})\big{)}+\sum_{i}p_{i}\sum_{p_{i}T/2<j\leq T}\Pr(q_{i}=\frac{j}{T})O\big{(}\log(\frac{T}{j})\big{)}\\ \leq\sum_{i}p_{i}\sum_{1\leq j\leq p_{i}T/2}\Pr(q_{i}=\frac{j}{T})O\big{(}\log T\big{)}+\sum_{i}p_{i}\sum_{p_{i}T/2<j\leq T}\Pr(q_{i}=\frac{j}{T})O\big{(}\log(\frac{2}{p_{i}})\big{)}\end{gathered}

\hbox{The second term}\leq\sum_{i}p_{i}O\big{(}\log(2/p_{i})\big{)}=O(1+H(\mathsf{po}_{k})).

\hbox{The second term}\leq\sum_{i}p_{i}O\big{(}\log(2/p_{i})\big{)}=O(1+H(\mathsf{po}_{k})).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · Computability, Logic, AI Algorithms

Full text

† HKUST, Hong Kong. ‡ Freie University‘̀at Berlin, Germany

\CopyrightSiu-Wing cheung and and Kai Jin

\EventEditors \EventNoEds1 \EventLongTitleAsian Association for Algorithms and Computation 2019 \EventShortTitleAAAC 2019 \EventAcronymAAAC19 \EventYear2019 \EventDateApril 19-21, 2019 \EventLocationSeoul, South Korea \EventLogo \SeriesVolume \ArticleNo1 \hideLIPIcs

A note on self-improving sorting with hidden partitions

Siu-Wing Cheng*†*

,

Man-Kwun Chiu*‡*

and

Kai Jin*†*

Key words and phrases:

Self-improving algorithm

1991 Mathematics Subject Classification:

Theory of computation

1. Introduction.

The sorting problem under a so-called “self-improving computational model” was studied in [1]: In this model, we will have input instances $I_{1},I_{2},\ldots,$ etc generated as follows. An instance $I$ contains $n$ elements $x_{1}^{I},\ldots,x_{n}^{I}$ , and its $i$ -th ( $1\leq i\leq n$ ) element $x_{i}^{I}$ is generated according to a distribution $\mathcal{D}_{i}$ . The $n$ distributions $\mathcal{D}_{1},\ldots,\mathcal{D}_{n}$ are fixed but are not given. The target is to compute and output $\pi(I)$ – the ranks of the $n$ elements in $I$ .

Let $H(\pi(I))$ denote the entropy of the output $\pi(I)$ . The authors in [1] showed that they can design a learning phase which learns the distributions and builds some data structures by analyzing several instances so that for a given $I$ in the operation phase, they can compute $\pi(I)$ in $O(H(\pi(I))+n)$ expected time, which matches the information theory lower bound.

We study in this paper a more general setting which allows some dependency among the $n$ elements. We assume that the $n$ elements are partitioned into $g$ groups (each element belongs to exactly one group) and in the $k$ -th ( $1\leq k\leq g$ ) group there is a variable $z_{k}$ which is generated according to a fixed distribution $\mathcal{D}_{k}$ and each element in this group is a function of $z_{k}$ . Note that the partition as well as the $g$ distributions $\mathcal{D}_{1},\ldots,\mathcal{D}_{g}$ are not given.

However, we need to impose some constraints on these functions of $z_{k}$ . Assume that the $k$ -th group contains $n_{k}$ elements $x_{1},\ldots,x_{n_{k}}$ and moreover $x_{1}=f_{1}(z_{k}),\ldots,x_{n_{k}}=f_{n_{k}}(z_{k})$ . We assume that each function $f_{i}()$ can have at most $\mu$ extremal points and every pair of functions $f_{i}()$ and $f_{j}()$ can have at most $\sigma$ intersections, where $\mu$ and $\sigma$ are known constants.

Under such constraints, our result is the following.

Theorem 1.1.

In operation phase, we can compute $\pi(I)$ in $O(H(\pi(I))+n)$ expected time.

1.1. Technique overview

Learning phase overview.

We learn the hidden partition using constant many instances. Also, we construct the $V$ -list in the same way as in [1]. Precisely, take $\lambda=\lceil\log n\rceil$ instances and merge all the $\lambda\cdot n$ elements in these instances into a big list and sort them in increasing order; denote the results by $y_{1},\ldots,y_{\lambda n}$ . Assign $V_{r}=y_{r\cdot\lambda}(1\leq r\leq n)$ , $V_{0}=-\infty$ , and $V_{n+1}=+\infty$ . We call $V_{r}$ the predecessor of $x_{i}$ if $x_{i}\in[V_{r},V_{r+1})$ . For the $k$ -th $(1\leq k\leq g)$ group, the predecessors of the $n_{k}$ elements in this group respectively and the order between these elements are denote by $\mathsf{po}_{k}$ ; its entropy denoted by $H(\mathsf{po}_{k})$ . Finally, let $n^{\prime}=\max_{k}n_{k}$ , and we sample $T=n^{\prime}(n(\mu+1)+n^{\prime}\sigma)\log n$ instances to learn the distribution of $\mathsf{po}_{k}$ .

Operation phase.

First, we compute $\mathsf{po}_{k}$ for each $k\leavevmode\nobreak\ (1\leq k\leq g)$ . Second, for each $k$ , denote $\sigma_{k}$ the list of $n_{k}$ elements in $k$ -th group in sorted order, find all $r$ such that $\sigma_{k}\cap[V_{r},V_{r+1})$ is nonempty, and put the sublist $\sigma_{k}\cap[V_{r},V_{r+1})$ into $S_{r}$ (So $S_{r}$ is a set of sublists). Third, we use a merge sort to merge all the sublists in $S_{r}$ into one list $s_{r}$ in sorted order. Finally, by concatenating $s_{0},\ldots,s_{n}$ , we obtain the sorted list of all elements.

1.2. Running time analysis of the operation phase.

We need the following three crucial lemmas.

Lemma 1.2.

For each $k\leavevmode\nobreak\ (1\leq k\leq g)$ , we can compute $\mathsf{po}_{k}$ in $O(H(\mathsf{po}_{k})+n_{k})$ time.

Lemma 1.3.

$\sum_{k}H(\mathsf{po}_{k})=H(\pi(I))+O(n)$ .

Lemma 1.4.

With high probability, on our construction of the $V$ -list, it is guaranteed that for each $r$ , the expected size of $S_{r}$ (i.e. the number of sublists in $S_{r}$ ) is a constant.

By Lemma 1.2, the first step runs in $O\left(\sum_{k}H(\mathsf{po}_{k})+n_{k}\right)$ time, which is $O(\sum_{k}H(\mathsf{po}_{k}))+O(n)=H(\pi(I))+O(n)$ time further according to Lemma 1.3. The second and last step cost $O(n)$ time. The third step takes $O(n)$ time by applying Lemma 1.4. Thus we get Theorem 1.1.

Lemma 1.3 follows from Lemma 2.3 of [1] because we can compute $(\mathsf{po}_{1},\ldots,\mathsf{po}_{g})$ in $O(n)$ comparisons given $\pi(I)$ . Lemma 1.4 is the same as Lemma 6 in [2]. Lemma 1.2 is proved below.

2. Learning phase I – compute the hidden partition in $\mu^{4}$ rounds

Assume we want to determine whether ( $x_{1}$ , $x_{2}$ ) is in the same group.

Recall that each function has at most $\mu$ extremal points. We take $m=\mu^{4}$ samples of $(x_{1},x_{2})$ . Denote the values by $(x_{1,1},x_{2,1}),\ldots,(x_{1,m},x_{2,m})$ . Without loss of generality, assume that $x_{1,1}\leq x_{1,2}\leq\ldots\leq x_{1,m}$ . (Otherwise we make it so by sorting)

Moreover, for any sequence of numbers $(A_{1},\ldots,A_{m})$ with length $m$ , we define function $D(A_{1},\ldots,A_{m})$ as the minimum number $d$ such that $(A_{1},\ldots,A_{m})$ can be partitioned into $d$ monotonic sub-sequence. A sub-sequence is monotonic if it is either increasing or decreasing.

We can prove that

•

If $x_{1}$ and $x_{2}$ are in the same group, $D(x_{2,1},\ldots,x_{2,m})\leq 2\mu+1$ ;

•

If $x_{1}$ and $x_{2}$ are in different groups, $D(x_{2,1},\ldots,x_{2,m})=\Omega(\mu^{2})$ .

Therefore,

•

If $D(x_{2,1},\ldots,x_{2,m})\leq 2\mu+1$ , with high probability $(x_{1},x_{2})$ are in the same group.

•

If $D(x_{2,1},\ldots,x_{2,m})>2\mu+1$ , it is definitely true that $(x_{1},x_{2})$ are in different groups.

As a consequence, we can learn the hidden partition easily by calling function $D$ .

Moreover, since $\mu$ is a constant, so as $m$ , hence it only costs constant time to compute $D$ .

3. Learning phase II – learn the distribution of $\mathsf{po}_{k}$

We need to introduce some notation here.

For convenience, assume that $x_{1},\ldots,x_{n_{k}}$ are in the $k$ -th group.

First, we draw $n_{k}$ curves $y=f_{1}(z),\ldots,y=f_{n_{k}}(z)$ . Moreover, for each $r\leavevmode\nobreak\ (1\leq r\leq n)$ , we draw a horizontal line $y=V_{r}$ . Let $\mathcal{A}$ denote the arrangement of these $n+n_{k}$ curves.

For each intersection in $\mathcal{A}$ , we draw a vertical line, as shown in Figure 1. According to our assumption on the functions, there are less than $W=n_{k}n(\mu+1)+n_{k}^{2}\sigma$ such intersections. These intersections divide the plane into at most $W$ slabs. Notice that $\mathsf{po}_{k}$ remains the same when $z_{k}$ is restricted to any fixed slab, yet it could be the same for different slabs. Thus there are at most $W$ possible (different) choices of $\mathsf{po}_{k}$ , denoted by $r_{1},\ldots,r_{W*}$ . Moreover, let $p_{i}$ be the probability that $\mathsf{po}_{k}$ is identical to $r_{i}$ . Note that $W^{*},p_{i},r_{i}$ are all unknown and we do not build $\mathcal{A}$ explicitly. Remind that the entropy $H(\mathsf{po}_{k})$ is simply defined as $\sum_{i}p_{i}\log(1/p_{i})$ .

In learning phase, we take $T\geq W\log n$ instances to sample the results of $\mathsf{po}_{k}$ and count their frequency. For $1\leq i\leq W^{*}$ , denote by $\chi_{i}$ the times that $r_{i}$ is sampled. Let $q_{i}=\chi_{i}/T$ . (Note that $\chi_{i}$ might be zero for some $r_{i}$ ; such $r_{i}$ is unknown to us. Other $r_{i}$ ’s are known.)

3.1. Store all the sampled results of $\mathsf{po}_{k}$ in a trie

We encode every known result of $\mathsf{po}_{k}$ by a vector $(b_{1},\ldots,b_{n_{k}})$ (similar to the Lehmer code).

Definition 3.1.

Given a known result of $\mathsf{po}_{k}$ , element $b_{1}$ is defined as among $V_{0},\ldots,V_{n}$ the predecessor of $x_{1}$ ; and $b_{2}$ is defined as among $V_{0},\ldots,V_{n},x_{1}$ the predecessor of $x_{2}$ ; so on and so forth; finally, $b_{n_{k}}$ is defined as the predecessor of $x_{n_{k}}$ among $V_{0},\ldots,V_{n},x_{1},\ldots,x_{n_{k}-1}$ .

Four examples are given in Figure 2 (a). The bottom of the columns shows the vectors.

We store the vectors of all sampled results of $\mathsf{po}_{k}$ into a trie as shown in Figure 2 (b). Moreover, we assign every node in this trie a weight: A leaf labeled by $r_{i}$ has weight $q_{i}$ , and the weight of an internal node equals the total weight of its sons; so the root has weight 1.

4. Operation phase Step 1 – compute $\mathsf{po}_{k}$

First, let us consider an ideal case where $q\equiv p$ , i.e. $q_{i}=p_{i}$ for every $1\leq i\leq W^{*}$ ,.

Assume we are given the values of $(x_{1},\ldots,x_{n_{k}})$ and we want to determine $\mathsf{po}_{k}$ . Equivalently, we want to determine the vector corresponding to $\mathsf{po}_{k}$ . Similar as what Fredman did in [3], using $(x_{1},\ldots,x_{n_{k}})$ , we can compute $b_{1},\ldots,b_{n_{k}}$ step by step. When $\mathsf{po}_{k}=r_{i}$ , this process corresponds to a path in the trie starting from the root to the leaf labeled with $r_{i}$ .

According to some basic algorithmic knowledge (see section 3.2 paragraph 1 in [1]), if currently we are at a node with weight $w_{j}$ and the next round we proceed to a son with weight $w_{k}$ , the time for choosing the son in this step would be $O(1+\log(w_{j}/w_{k}))$ . Therefore, if $\mathsf{po}_{k}=r_{i}$ , it takes $O(n_{k}+\log(1/q_{i}))$ time to reach the node labeled with $r_{i}$ .

Further since the probability that “ $\mathsf{po}_{k}=r_{i}$ ” is $p_{i}$ , the expected time for computing $\mathsf{po}_{k}$ would be $O(\sum_{i}p_{i}(n_{k}+\log(1/q_{i})))=O(n_{k}+\sum_{i}p_{i}\log(1/q_{i}))=O(n_{k}+H(\mathsf{po}_{k}))$ when $q\equiv p$ .

Next, we show that even if $q\neq p$ , the expected running time is still $O(n_{k}+H(\mathsf{po}_{k}))$ .

4.1. The proof of Lemma 1.2

Denote $\mathbf{q}=(q_{1},\ldots,q_{W^{*}})$ . Let $t^{\mathbf{q}}_{i}$ be the time for computing $\mathsf{po}_{k}$ when $\mathsf{po}_{k}=r_{i}$ and when our sampling result is some fixed $\mathbf{q}$ . Similar as in the above case, for $q_{i}>0$ , we compute $\mathsf{po}_{k}$ in time $O(n_{k}+\log(1/q_{i}))$ when $\mathsf{po}_{k}=r_{i}$ ; yet for $q_{i}=0$ , we find no result after searching the trie and we use a trivial method to compute $\mathsf{po}_{k}$ and it costs $O(n_{k}\cdot\log n)$ time. Therefore,

[TABLE]

Thus the expected running time for computing $\mathsf{po}_{k}$ in operation phase is given by

[TABLE]

To bound the first term, we need to bound $\sum_{1\leq j\leq p_{i}T/2}\Pr(q_{i}=\frac{j}{T})<\Pr(q_{i}\leq p_{i}/2)$ , for which we apply the Chernoff bound. Note that the expectation of $q_{i}$ is given by $p_{i}$ , so $\Pr(q_{i}\leq p_{i}/2)\leq e^{-p_{i}T/8}\leq\frac{8}{p_{i}T}$ . Hence $\hbox{the first term}\leq\sum_{i}p_{i}\frac{8}{p_{i}T}O(\log T)=O(W^{\prime}\log T/T)=O(1)$ .

To sum up, altogether we prove that the expected running time is $O(n_{k}+H(\mathsf{po}_{k}))$ .

Bibliography3

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. Ailon, B. Chazelle, K. Clarkson, D. Liu, W. Mulzer, and C. Seshadhri. Self-improving algorithms. SIAM Journal on Computing , 40(2):350–375, 2011. doi:10.1137/090766437 . · doi ↗
2[2] S. Cheng and L. Yan. Extensions of self-improving sorters. In 29th International Symposium on Algorithms and Computation, ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan , pages 63:1–63:12, 2018. doi:10.4230/LIP Ics.ISAAC.2018.63 . · doi ↗
3[3] M.L. Fredman. How good is the information theory bound in sorting? Theoretical Computer Science , 1(4):355 – 361, 1976. doi:https://doi.org/10.1016/0304-3975(76)90078-5 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A note on self-improving sorting with hidden partitions

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction.

Theorem 1.1**.**

1.1. Technique overview

Learning phase overview.

Operation phase.

1.2. Running time analysis of the operation phase.

Lemma 1.2**.**

Lemma 1.3**.**

Lemma 1.4**.**

2. Learning phase I – compute the hidden partition in μ4\mu^{4}μ4 rounds

3. Learning phase II – learn the distribution of pok\mathsf{po}_{k}pok​

3.1. Store all the sampled results of pok\mathsf{po}_{k}pok​ in a trie

Definition 3.1**.**

4. Operation phase Step 1 – compute pok\mathsf{po}_{k}pok​

4.1. The proof of Lemma 1.2

Theorem 1.1.

Lemma 1.2.

Lemma 1.3.

Lemma 1.4.

2. Learning phase I – compute the hidden partition in $\mu^{4}$ rounds

3. Learning phase II – learn the distribution of $\mathsf{po}_{k}$

3.1. Store all the sampled results of $\mathsf{po}_{k}$ in a trie

Definition 3.1.

4. Operation phase Step 1 – compute $\mathsf{po}_{k}$