A Foundation for Proving Splay is Dynamically Optimal

Caleb C. Levy; Robert E. Tarjan

arXiv:1907.06310·cs.DS·May 10, 2022

A Foundation for Proving Splay is Dynamically Optimal

Caleb C. Levy, Robert E. Tarjan

PDF

Open Access

TL;DR

This paper develops a theoretical framework to prove the long-standing conjecture that the splay tree algorithm is dynamically optimal for binary search tree operations, by linking it to approximate monotonicity.

Contribution

It introduces the concept of approximate monotonicity and establishes that Splay is dynamically optimal if and only if it is approximately monotone, laying the groundwork for proving the conjecture.

Findings

01

Lower bounds on optimal cost are approximately monotone.

02

Splay is dynamically optimal if and only if it is approximately monotone.

03

Framework extends to insertion, deletion, and related algorithms.

Abstract

Consider the task of performing a sequence of searches in a binary search tree. After each search, we allow an algorithm to arbitrarily restructure the tree. The cost of executing the task is the sum of the time spent searching and the time spent optimizing the searches with restructuring operations. Sleator and Tarjan introduced this notion in 1985, along with an algorithm and a conjecture. The algorithm, Splay, is an elegant procedure for performing adjustments that move searched items to the top of the tree. The conjecture, called dynamic optimality, is that the cost of splaying is always within a constant factor of the optimal algorithm for performing searches. We lay a foundation for proving the dynamic optimality conjecture. Central to our method is approximate monotonicity. Approximately monotone algorithms are those whose cost does not increase by more than a fixed multiple…

Equations2

ℓ_{J^{+}} (z) = j + ⎩ ⎨ ⎧ G H B (1 + A) + G c = 0 c = 1 otherwise

ℓ_{J^{+}} (z) = j + ⎩ ⎨ ⎧ G H B (1 + A) + G c = 0 c = 1 otherwise

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems · Algorithms and Data Compression · Machine Learning and Algorithms

Full text

A Foundation for Proving Splay is Dynamically Optimal111This paper is an adaptation of the first author’s Ph.D. thesis [45]. We presented an earlier version at SODA [46].

Caleb C. Levy222Sunshine; [email protected].

Robert E. Tarjan333Department of Computer Science, Princeton University; Intertrust Technologies; [email protected].

Abstract

Consider the task of performing a sequence of searches in a binary search tree. After each search, we allow an algorithm to arbitrarily restructure the tree. The cost of executing the task is the sum of the time spent searching and the time spent optimizing the searches with restructuring operations. Sleator and Tarjan introduced this notion in 1985, along with an algorithm and a conjecture. The algorithm, Splay, is an elegant procedure for performing adjustments that move searched items to the top of the tree. The conjecture, called dynamic optimality, is that the cost of splaying is always within a constant factor of the optimal algorithm for performing searches. We lay a foundation for proving the dynamic optimality conjecture. Central to our method is approximate monotonicity. Approximately monotone algorithms are those whose cost does not increase by more than a fixed multiple after removing searches from the sequence. As we shall see, Splay is dynamically optimal if and only if it is approximately monotone. This result extends to a weaker form of approximate monotonicity as well as insertion, deletion, and related algorithms. We prove that a lower bound on optimal execution cost is approximately monotone and outline how to adapt this proof from the lower bound to Splay, and how to overcome the remaining barriers to establishing dynamic optimality.

1 Context

The binary search tree is the canonical pointer-based data structure for maintaining a sorted collection in fast memory. Its most attractive feature is that the number of comparisons required to verify the presence of an item is logarithmic in the size of the tree, provided that the tree is properly arranged. Without exercising care when adding elements, however, a binary search tree can easily become unbalanced, making search cost proportional to the size of the tree in the worst case. Thus binary search trees require some form of maintenance and restructuring for good performance.

Adel’son-Vel’skii and Landis gave the first method that guarantees efficient searches in the presence of updates [1]. They supplement nodes with bits that provide rough information about how balanced each node’s subtrees are. After an insertion or deletion, a restructuring procedure restores invariants on the balance bits. These invariants ensure that all paths in the tree have length at most logarithmic in the tree’s size. There are many variations of this idea. Perhaps most famous is the red-black tree due to its requiring fewer restructuring operations and a catchy name [32]. Restructuring schemes based on balance bits remain an active topic of research [33]. Many more schemes now exist. Randomized search trees, such as treaps [60], zip trees [66] and others [52] trade worst-case performance guarantees for good expected behavior in order to gain simpler rebalancing procedures and fewer pointer changes. Scapegoat trees [4, 28] defer restructuring operations until they can be executed in bulk. B-trees [5] and their derivatives [20], close relatives of binary search trees that use system memory characteristics to determine node arity, are ubiquitous in database applications. There are numerous related data structures. For the most part, they are well-understood. However, there is a class of binary search tree algorithm whose behavior remains one of the great open questions in theoretical computer science.

While the above-mentioned data structures guarantee logarithmic search time, they usually cannot perform much better than this. Real-world access patterns often have some latent structure. For example records may be arranged in partially sorted sub-blocks, and databases often receive frequent requests for a small number of high-traffic elements. In such situations it can be possible to do better than logarithmic time per access by adjusting the tree after searches, instead of solely after adding or removing elements. This leads to colloquially named “self-adjusting” binary search tree algorithms. Allen and Munro were the first to examine such algorithms in depth [3]. They developed a simple procedure with good expected behavior in many cases. By far the most famous self-adjusting binary search tree algorithm is Sleator and Tarjan’s improvement to this procedure, called Splay [61], which has many compelling properties and applications. For our purposes, Sleator and Tarjan’s most important contribution to this topic is not what they proved, but instead what they left unresolved. The dynamic optimality conjecture asserts that Splay is essentially the ideal algorithm for every possible access pattern. This problem’s intrigue arises from several sources.

Dynamic optimality would imply we can use Splay as a stand-in for many more-specialized data structures. Splay simultaneously acts as a balanced search tree and as a spatial and a temporal cache. It also shares properties with entropy-minimizing static trees [61] and data structures for disjoint set union [50]. Dynamically optimal algorithms can emulate multi-finger binary search trees [12] and doubly-ended queues [65].

Advances in our understanding of binary search trees percolate into other areas. Splay and related self-adjusting data structures inspired the creation of pairing heaps [26], smooth heaps [43] and slim heaps [35]. Splay’s properties find utility in encoding schemes [40], routing problems [59] and optimizing for concurrent non-uniform access [2], and the conjecture has analogues for B-trees [9, 24], search-tree-on-tree data structures [7, 8] and external memory settings [6].

Furthermore, the conjecture has become a nexus for the development of new strategies for analyzing data structures. Splay was intimately involved in the adaptation of potential functions from physics to computer science [64]. Concepts common in the analysis of forbidden substructures are frequently applied to Splay and related self-adjusting algorithms, with examples including Davenport-Schinzel sequences [55], forbidden submatrices [54] and pattern-avoiding permutations [13, 30]. Techniques from computational geometry are now common in this area of research [22, 39].

Finally, the conjecture has a distinct intellectual allure. At first blush, its claim seems too good to be true, which makes the idea of proving it all the more attractive. Its statement is elegant and deceptively simple, yet anyone who has attempted to tackle the problem can attest to its subtlety and utter defiance of standard mathematical approaches. Solutions frequently seem tantalizingly close while remaining just out of reach.

This investigation adopts a somewhat different tone from its companions. It can often be easier to induct on stronger hypotheses because they provide more exploitable structure. Accordingly, we have no qualms about presuming that Splay is dynamically optimal and allowing this to guide our intuition. Our objective is to determine how we can prove the conjecture, not if. Section 2 defines our execution model and summarizes related work. Section 3 shows that Splay is dynamically optimal if and only if it is approximately monotone. Section 4 formalizes optimality with additive overhead and demonstrates that if Splay is optimal then it has no such overhead. Section 5 extends both Splay and our execution model to incorporate mutation operations and establishes that if Splay is optimal without these operations then it is optimal when they are permitted. Section 6 generalizes our results to similar algorithms. Section 7 establishes that a non-trivial lower bound on optimal execution cost is approximately monotone. Section 8 outlines a speculative proof that Splay is approximately monotone. The appendices formalize relevant folklore.

2 Preliminaries

A binary tree $T$ comprises a finite set of nodes, with one node designated to be the root. All nodes have a left and a right child pointer, each leading to a different node. Either or both children may be missing; a missing child is denoted by $\mathtt{null}$ . Every node in $T$ , save for the root, has a single parent node of which it is a child. (The root has no parent.) Each pairing of a node with its parent is an edge in $T$ . The size of $T$ is the number of nodes it contains, and is denoted $|T|$ . There is a unique path from $\operatorname{root}(T)$ to every other node $x$ in $T$ , called the access path for $x$ in $T$ . If $x$ is on the access path for $y$ then $x$ is an ancestor of $y$ , and $y$ is a descendant of $x$ . If these two nodes are distinct then $x$ is a strict ancestor of $y$ and $y$ is a strict descendant of $x$ . (Every node is an ancestor and a descendant of itself.) The subgraph comprising all descendants of $x$ is called the subtree rooted at $x$ . Nodes thus have left and right subtrees rooted respectively at their left and right children. (Subtrees are empty for $\mathtt{null}$ children.) The depth of $x$ , denoted $d_{T}(x)$ , is the length, in nodes, of its access path. A rooted hull in $T$ is a connected subgraph of $T$ that includes the root. A rooted hull is itself a binary tree.

In a binary search tree, every node has a unique key, and the tree satisfies the symmetric order condition: every node’s key is greater than those in its left subtree and smaller than those in its right subtree. The binary search tree derives its name from how its structure enables finding keys. To find a requested key, initialize the current node to be the root. While the current node is not $\mathtt{null}$ and does not contain the requested key, replace the current node by its left or right child depending on whether requested key is smaller or larger than the key in the current node, respectively. The search returns the last current node, which contains the requested key if said key is in the tree and otherwise $\mathtt{null}$ . The left spine of $T$ is the access path to the smallest key in $T$ , and the right spine of $T$ is the access path to the largest key in $T$ . (The spines of the empty tree are empty.) The left and right spines consist entirely of left and right pointers, respectively. A tree is flat if every node is on the left or right spine. To keep our presentation simple, we assume that a key and the node containing it can be used interchangeably in binary comparisons.

We denote by $|X|$ the length of a finite sequence $X$ . The symbol “ $\oplus$ ” denotes sequence concatenation. (We sometimes write $X_{1}\oplus\cdots\oplus X_{m}$ as $\bigoplus_{i=1}^{m}X_{i}$ .) The postorder of the empty tree is the empty sequence, and the postorder of binary search tree $T$ whose root $r$ has left and right subtrees $L$ and $R$ is $\operatorname{postorder}(L)\oplus\operatorname{postorder}(R)\oplus(r)$ . The index of a given key in $T$ is the number of keys in $T$ that are less than or equal to the given key. The function mapping each key in $T$ to its index is the index map for $T$ , and the inverse of this function is the reverse index. The index map of a finite totally ordered set is defined analogously. Two binary search trees are isomorphic if relabelling the keys in each tree to their respective indices produces trees with the same postorder.

To perform a transformation on tree $T$ , first select an arbitrary rooted hull $Q$ in $T$ . Then reshape $Q$ into any other binary search tree $Q^{\prime}$ containing the same set of keys. We refer to $Q^{\prime}$ as a transition tree. To complete the operation, form the after-tree $T^{\prime}$ by substituting $Q^{\prime}$ for $Q$ in $T$ , re-attaching the subtrees of $Q$ to $Q^{\prime}$ in the manner uniquely prescribed by the symmetric order.

An instance of a binary search tree optimization problem comprises a sequence $X=(x_{1},\dots,x_{m})$ of requested keys and an initial tree $T$ containing these keys. An execution $E$ for this instance comprises a sequence of rooted hulls $Q_{1},\dots,Q_{m}$ , a sequence of transition trees $Q^{\prime}_{1},\dots,Q^{\prime}_{m}$ , and a sequence of after-trees $T_{1},\dots,T_{m}$ . For $1\leq i\leq m$ , $Q_{i}$ is a rooted hull in $T_{i-1}$ , $Q^{\prime}_{i}$ is a binary search tree with the same keys as $Q_{i}$ such that $x_{i}=\operatorname{root}(Q^{\prime}_{i})$ , and $T_{i}$ results from substituting $Q^{\prime}_{i}$ for $Q_{i}$ in $T_{i-1}$ , where $T_{0}=T$ . (We refer to $T_{m}$ as the execution’s final tree.) The cost of $E$ is $\sum_{i=1}^{m}|Q^{\prime}_{i}|$ . At least one execution for $X$ starting from $T$ has minimum, or optimum cost, and we denote this cost by $\operatorname{OPT}(X,T)$ . Figure 1 shows an instance and a corresponding execution.

An execution’s rooted hull and after-tree for a given request are uniquely determined by the previous after-tree and the request’s transition tree. Thus we shall occasionally denote an execution by its sequence of transition trees. Defining the cost of an execution as the sum of the transition tree sizes captures the notion of paying for restructuring: fewer operations are required to substitute a smaller tree. Each rooted hull contains the access path, which accounts for the cost of searching. We describe instances that include insertions and deletions in Section 5.

Unless otherwise implied, we may assume without loss of generality that every node in the initial tree $T$ has a descendant in $T$ whose key is requested in $X$ [15, Theorem 43], in which case every key in $T$ appears in at least one transition tree and $\operatorname{OPT}(X,T)\geq|T|$ . Similarly, $\operatorname{OPT}(X,T)\geq|X|$ since every execution produces at least one transition tree per request. Furthermore, since an optimal algorithm can reshape the entire initial tree on the first request, $\operatorname{OPT}(X,T)\leq\operatorname{OPT}(X,T^{\prime})+|T|$ for any pair of valid binary search trees $T$ and $T^{\prime}$ for request sequence $X$ with the same keys. Therefore, it makes little difference to optimal executions whether the initial tree is left specified or unspecified, and many authors do not distinguish between $\operatorname{OPT}(X,T)$ and $\operatorname{OPT}_{\min}(X)=\min_{T\text{ for }X}\operatorname{OPT}(X,T)$ . However, the initial tree can, potentially, have a significant impact on algorithmic behavior. Thus, we require instances to specify an initial tree. We discuss this further in Section 4.

A binary search tree algorithm $\mathcal{A}$ maps each instance to an execution of the instance. We denote the cost of this execution by $\operatorname{cost}_{\mathcal{A}}(X,T)$ . We say $\mathcal{A}$ is dynamically optimal if there is some constant $c\geq 1$ so that $\operatorname{cost}_{\mathcal{A}}(X,T)\leq c\operatorname{OPT}(X,T)$ for all request sequences $X$ and all corresponding initial trees $T$ . (Other terms include “constant-competitive” and “instance optimal.”)

A rotation at left child $x$ with parent $y$ in $T$ replaces the subtree rooted at $y$ with the tree whose root $x$ has right child $y$ such that the right subtree of $x$ before the rotation becomes the left subtree of $y$ afterward and the left subtree of $x$ and right subtree of $y$ are unchanged. Figure 2 depicts this process. We can also identify this rotation with the edge connecting $x$ to $y$ in $T$ . Rotation at a right child is symmetric, and rotation at the root is undefined. Rotation preserves symmetric order while changing up to three child pointers in the tree. Sleator and Tarjan originally measured execution cost by counting rotations. (See Appendix A.)

A splay operation begins with a binary search for a key in the tree. Let $x$ be the node returned by this search. If $x$ is not $\mathtt{null}$ then the algorithm repeatedly applies a splay step until $x$ becomes the root. A splay step has one of three forms. If the parent of $x$ is the root then rotate at $x$ . (This case is always terminal.) Otherwise, if $x$ is a left child and its parent is a right child, or vice-versa, rotate at $x$ twice. Otherwise, rotate at the parent of $x$ , and then rotate at $x$ . Sleator and Tarjan assigned the respective names zig, zig-zag and zig-zig to these three cases [61]. The series of splay steps that bring $x$ to the root are collectively called splaying at $x$ , or simply splaying $x$ . If $X=(x_{1},\dots,x_{m})$ is a sequence of requested keys in $T$ then the cost of splaying $X$ starting from $T$ is $\sum_{i=1}^{m}d_{T_{i-1}}(x_{i})$ , where $T_{0}=T$ and $T_{i}$ is the result of splaying $x_{i}$ in $T_{i-1}$ for $1\leq i\leq m$ . We will primarily be dealing with the Splay algorithm, so $\operatorname{cost}(X,T)$ , without subscript, will always refer to the cost of splaying the keys of $X$ starting from $T$ .

While an individual splay path can involve every node in the tree, the mean cost of a splay operation, averaged over sufficiently many requests, is logarithmic in the tree’s size [61, Theorem 1]. This performance is similar to that of balanced binary search trees. What makes Splay remarkable is that it also takes advantage of latent structure in the request sequence. The amortized cost per splay operation is logarithmic in the number of unique keys requested since the previous request for the splayed key [61, Theorem 4]. Thus, Splay exploits temporal locality in the access pattern. Splay simultaneously exploits spatial locality. The amortized cost of a splay operation is logarithmic in the difference between successively requested keys’ indices in the starting tree [18, 19]. The dynamic optimality conjecture states that Splay is dynamically optimal.

Splay has many generalizations. Subramanian defined a class of algorithms that reshape a tree in small steps. A set of rules, called a “template,” determines which step to take based on the arrangement of nodes in the immediate vicinity of the currently selected node. Different templates give rise to different algorithms, and a number of these algorithms have many of the same properties as Splay [62]. Georgakopoulos and McClurkin [29] and later Chalermsook et al. [14] proved further results about related algorithms. Section 6 examines a generalization of template algorithms.

Besides Splay, the main candidate algorithm for optimality is colloquially known as Greedy. Lucas and Munro independently conjectured a version of the algorithm that arranges keys on the access path according to their soonest future access times is dynamically optimal [49, 53]. Demaine et al. subsequently developed a representation of binary search tree executions as cartesian coordinate point-sets [22]. They showed that in this geometric representation, Greedy executes a new request by uniting its execution of the previous requests with the minimum set of points needed to make the new execution satisfy some required properties. Subsequently, many of the interesting behaviors that first drew attention to Splay have been proved for Greedy, including exploitation of temporal locality [25, 31] and spatial locality [39], as well as some additional properties [13, 30].

Two other algorithms serve primarily demonstrative purposes. “Tango” trees require cost proportional to at most $\lg\lg|T|$ times the optimum cost in order to execute an instance [23], and Iacono describes a multiplicative weights update method that is optimal so long as a certain class of binary search tree algorithm contains an optimal member [37]. Both are difficult to implement.

Wilber derived two lower bounds on the cost of executions for a given instance [67]. One of his lower bounds counts the number of occurrences of certain structural patterns in the request sequence with respect to a given reference tree. We examine Wilber’s other lower bound, which we call the crossing bound, in Section 7. A third lower bound, called the “independent rectangle bound,” is defined geometrically [22]. Research into the relationships among these bounds is ongoing [10, 44].

Currently, there is no sub-exponential time algorithm that is known to compute the cost of an optimum binary search tree execution for an instance to within a constant factor. Circumstantial evidence indicates that exact computation of optimal execution cost may be intractable, since a slight generalization of the problem, in which instances comprise requests for batches of keys, is NP-Complete [22]. The theoretical and practical difficulties we encountered when trying to reason about optimal binary search tree executions ultimately led us to the present approach, which consciously avoids directly comparing algorithms with optimal behavior.

3 Approximate Monotonicity

How can we prove that Splay is dynamically optimal without knowing what optimum executions “look like?” We approach this question by combining two concepts. The first starts with a simple observation: in many situations, one intuitively expects that removing requests from an instance should decrease the cost for the algorithm to execute it. This may not always be the case, but it is a reasonable idea to explore. The second idea is to force an algorithm to simulate executions by feeding it appropriately constructed instances. An algorithm $\mathcal{A}$ is approximately monotone if there is some constant $b\geq 1$ so that $\operatorname{cost}_{\mathcal{A}}(Y,T)\leq b\operatorname{cost}_{\mathcal{A}}(X,T)$ for every request sequence $X$ , subsequence $Y$ , and initial tree $T$ . A simulation embedding $\mathcal{S}$ for $\mathcal{A}$ is a map from executions to request sequences for which there exists $c\geq 1$ such that $\operatorname{cost}_{\mathcal{A}}(\mathcal{S}(E),T)$ is at most $c$ times the cost of $E$ and $X$ is a subsequence of $\mathcal{S}(E)$ for all instances $(X,T)$ and corresponding executions $E$ . If such a map exists then $\mathcal{A}$ is coercible.

We add a few clarifying comments on terminology. A subsequence need not be contiguous. For example, $(1,3,6)$ is a subsequence of $(1,2,3,5,6)$ . Also, every sequence is a subsequence of itself. A real-valued set function $F$ is monotone if $F(A)\leq F(B)$ for all $A\subseteq B$ . Approximate monotonicity relaxes this requirement. (The functions we deal with are sequence-valued, but the concept is identical.) Our SODA paper referred to approximate monotonicity as the “subsequence property” [46]. In this work, we only build simulation embeddings for binary search tree algorithms. However, the concept itself seems more general and likely has other applications.

Theorem 3.1.

Optimal algorithms are monotone.

Proof.

Let $X$ be a sequence of $m$ requests with starting tree $T$ , let $E=(Q^{\prime}_{1},\dots,Q^{\prime}_{m})$ be an optimal execution of this instance with after-trees $T_{1},\dots,T_{m}$ , and form subsequence $Y$ from $X$ by choosing a subset $A$ of $\{1,\dots,m\}$ and keeping the requests in $X$ at times in $A$ . The requests at times $\{1,\dots,m\}\setminus A$ can be partitioned into contiguous blocks of integers. (For example, if $m=11$ and $A=\{3,7,9\}$ then the removed time blocks are $\{1,2\}$ , $\{4,5,6\}$ , $\{8\}$ and $\{10,11\}$ .) Let $\alpha$ be the index map for $A$ . Define the transition tree sequence $F=(P^{\prime}_{1},\dots,P^{\prime}_{|Y|})$ as follows. For $i\in A$ , if $i$ is one greater than the maximal element in a removed time block then set $P^{\prime}_{\alpha(i)}$ to be the rooted hull in $T_{i}$ comprising the union keys in $Q^{\prime}_{i}$ with the keys in the transition trees of $E$ for the requests in said block, and otherwise set $P^{\prime}_{\alpha(i)}=Q^{\prime}_{i}$ . The transition tree sequence $F$ is a valid execution for $Y$ starting from $T$ , and $\sum_{i\in A}|P^{\prime}_{\alpha(i)}|\leq\sum_{1\leq i\leq m}|Q^{\prime}_{i}|$ . Since $E$ is an optimal execution for $X$ starting from $T$ , we conclude $\operatorname{OPT}(Y,T)\leq\operatorname{OPT}(X,T)$ . ∎

Theorem 3.2.

A coercible algorithm is dynamically optimal if and only if it is approximately monotone.

Proof.

A simulation embedding can be used to simulate an optimal execution of a given instance just as well as any other execution. The cost for the algorithm to execute the simulation is no more than a fixed multiple of the optimal cost for that instance. The simulation of this optimal execution contains the original request sequence as a subsequence. If the algorithm is also approximately monotone, then the cost of executing the original instance will not exceed a fixed multiple of the simulation’s cost and hence of the optimal cost. In the other direction, if $\mathcal{A}$ is dynamically optimal then there exists some constant $c$ for which $\operatorname{cost}_{\mathcal{A}}(Y,T)\leq c\operatorname{OPT}(Y,T)\leq c\operatorname{OPT}(X,T)$ for all instances $(X,T)$ and subsequences $Y$ of $X$ , where the last inequality follows from Theorem 3.1. ∎

Approximate monotonicity is useful even if an algorithm is not dynamically optimal. For $n>0$ , define the subsequence overhead $f(n)$ and optimal overhead $h(n)$ of $\mathcal{A}$ to be the respective suprema of $\operatorname{cost}_{\mathcal{A}}(Y,T)/\operatorname{cost}_{\mathcal{A}}(X,T)$ and $\operatorname{cost}_{\mathcal{A}}(X,T)/\operatorname{OPT}(X,T)$ taken over all instances $(X,T)$ and all subsequences $Y$ of $X$ for which $|T|=n$ .

Theorem 3.3.

For all $n>0$ , a coercible algorithm’s subsequence overhead and optimal overhead are within a constant factor independent of $n$ .

Proof.

By Theorem 3.1, $\operatorname{cost}_{\mathcal{A}}(X,T)\geq\operatorname{OPT}(X,T)\geq\operatorname{OPT}(Y,T)$ for every instance $(X,T)$ and subsequence $Y$ of $X$ , meaning $\operatorname{cost}_{\mathcal{A}}(Y,T)/\operatorname{cost}_{\mathcal{A}}(X,T)\leq\operatorname{cost}_{\mathcal{A}}(Y,T)/\operatorname{OPT}(Y,T)\leq h(|T|)$ . If $\mathcal{A}$ is coercible then there exists a simulation embedding $\mathcal{S}$ and constant $c$ such that $\operatorname{OPT}(X,T)\geq\operatorname{cost}_{\mathcal{A}}(\mathcal{S}(E),T)/c$ for every optimal execution $E$ of request sequence $X$ with starting tree $T$ . Therefore $\operatorname{cost}_{\mathcal{A}}(X,T)/\operatorname{OPT}(X,T)\leq c\operatorname{cost}_{\mathcal{A}}(X,T)/\operatorname{cost}_{\mathcal{A}}(\mathcal{S}(E),T)\leq cf(|T|)$ . Since these inequalities hold for all instances with an initial tree of size $n$ , they hold true for the supremum. Thus $1\leq h(n)/f(n)\leq c$ for all $n$ . ∎

To build simulation embeddings, we employ an algorithm for transforming a binary search tree $T$ into another binary search tree $T^{\prime}$ with the same keys through the application of at most $4|T|$ restricted rotations, which must occur at children or grandchildren of the root. Begin by repeatedly rotating at the root’s left child until all nodes in the left subtree of the root are on the left spine. Then, repeat the following until the root’s left and right subtrees are respectively left and right spines: repeatedly rotate at the left child of the root’s right child so long as said left child is not $\mathtt{null}$ , and then rotate at the root’s right child. Once the tree is flat, continually rotate at either the left or the right child of the root until the tree is the same as that resultant from applying the above flattening procedure to $T^{\prime}$ . Finally, apply the reverse flattening procedure to recover $T^{\prime}$ . Cleary and Taback first derived this algorithm using group theory [17]. Our description is based on Lucas’ presentation [48].

Theorem 3.4.

For every pair of binary search trees $T$ and $T^{\prime}$ of size at least four with the same keys, there exists a request sequence such that Splay’s execution of the requests starting from $T$ has cost linear in $|T|$ and has final tree $T^{\prime}$ .

Proof.

Let $u_{1},\dots,u_{k}$ be the sequence of keys at which Lucas’ restricted rotation algorithm performs rotations in order to transform $T$ into $T^{\prime}$ . Let $T_{0}=T$ and for $1\leq i\leq k$ let $q_{i}$ be the tree of lexicographically smallest postorder among four-node rooted hulls in $T_{i-1}$ that contain $u_{i}$ . (Using minimal postorder is just a convention.) Form $q^{\prime}_{i}$ by rotating at $u_{i}$ in $q_{i}$ and form $T_{i}$ by substituting $q^{\prime}_{i}$ for $q_{i}$ in $T_{i-1}$ . Form the key sequence $U_{i}$ by relabeling the keys in Figure 3 via the reverse index for $q_{i}$ and recording the sequence of keys in marked nodes on the path from $q_{i}$ to $q^{\prime}_{i}$ , excluding the key marked in $q^{\prime}_{i}$ . Splaying $U_{i}$ starting from $q_{i}$ results in final tree $q^{\prime}_{i}$ . The structure of the subtrees hanging from the path do not affect the transition tree of a splay operation. Thus, using splay operations to induce a restricted rotation in a four-node rooted hull in a larger tree $T$ also performs the restricted rotation in $T$ , and the request sequence $V=U_{1}\oplus\cdots\oplus U_{k}$ induces Splay to successively enact the restricted rotations that transform $T$ into $T^{\prime}$ . Each of these rotations corresponds to at most thirteen requests in $V$ . Every access path in Splay’s execution of $V$ starting from $T$ has length at most four. Since $k<4|T|$ , the total cost of this execution is at most $208|T|$ . ∎

Theorem 3.5.

Splay is dynamically optimal if and only if it is approximately monotone.

Proof.

We prove Splay is coercible. Let $E$ be an execution for $X=(x_{1},\dots,x_{m})$ starting from $T$ comprising rooted hulls $Q_{1},\dots,Q_{m}$ , transition trees $Q^{\prime}_{1},\dots,Q^{\prime}_{m}$ , and after-trees $T_{1},\dots,T_{m}$ . For initial trees of size three or less set $\mathcal{S}(E)=X$ . Otherwise, let $\mathcal{S}(E)=V_{1}\oplus\cdots\oplus V_{m}$ where, for $1\leq i\leq m$ , $V_{i}$ is the request sequence constructed in Theorem 3.4 for inducing Splay to transform $Q_{i}$ into $Q^{\prime}_{i}$ . (If $Q_{i}=Q^{\prime}_{i}$ then $V_{i}$ is the singleton request sequence whose sole term is $x_{i}$ .) Splaying $V_{i}$ starting from $T_{i-1}$ induces a substitution of $Q^{\prime}_{i}$ for $Q_{i}$ in $T_{i-1}$ to form $T_{i}$ , where $T_{0}=T$ . A splay operation always places the requested key as the root of the after-tree. Since $x_{i}$ is the root of $Q^{\prime}_{i}$ it must be the last key in $V_{i}$ . Therefore, $X$ is a subsequence of $\mathcal{S}(E)$ , and by Theorem 3.4 $\operatorname{cost}(\mathcal{S}(E),T)\leq 208(|Q^{\prime}_{1}|+\cdots+|Q^{\prime}_{m}|)$ . Hence, $\mathcal{S}$ is a simulation embedding for Splay and Splay is coercible. Apply Theorem 3.2. ∎

Splay is only approximately monotone. Let $T$ be a left spine with integer keys $1$ to $2^{k}-1$ for $k>3$ , let $Y=\bigoplus_{i=0}^{k-1}(2^{i})$ be the geometric sequence ascending in powers of two, let $Y^{\prime}$ be the reversal of $Y$ and let $X=Y^{\prime}\oplus Y$ . Splaying the first half of $X$ efficiently brings the requested keys close to the root and ensures $\operatorname{cost}(X,T)=2^{k}+4k-5$ . Meanwhile, splaying each request in $Y$ only halves the depth of the next requested key, so $\operatorname{cost}(Y,T)=2^{k+1}-3$ . Hence the limit of Splay’s subsequence overhead, as tree size increases, is at least two.

Our simulation embedding is designed for minimalism. A more careful analysis can reduce the constant factor. Two prior works construct simulation embeddings for binary search tree algorithms. Harmon builds a simulation embedding for the geometric version of Greedy [34, Chapter 2.3.4], while Russo’s simulation embedding for Splay uses rotation-based executions and potential-based analysis [58]. Neither work treats simulation embeddings as mathematical objects in their own right. Subsequent to the publication of our SODA paper [46], Chalermsook and Jiamjitrak constructed simulation embeddings for a class of template-like algorithms using potential-based methods [16]. Reddmann examines several algorithms’ competitive overheads numerically [56].

4 Startup Overhead

In principle, an algorithm may need to execute many requests in order to bring a poorly structured initial tree into a good state before it can behave optimally. Formally, an algorithm $\mathcal{A}$ is eventually optimal if there exists a positive constant $b$ and startup overhead $g$ mapping starting trees to integers such that $\operatorname{cost}_{\mathcal{A}}(X,T)\leq b\operatorname{OPT}(X,T)+g(T)$ for all request sequences $X$ and corresponding initial trees $T$ . Similarly, if $\operatorname{cost}_{\mathcal{A}}(Y,T)\leq b\operatorname{cost}_{\mathcal{A}}(X,T)+g(T)$ for all instances $(X,T)$ and subsequences $Y$ of $X$ then $\mathcal{A}$ is eventually monotone. (Eventual optimality implies eventual monotonicity.)

Some works do not distinguish between eventual and dynamic optimality, but Sleator and Tarjan were more optimistic. They made no allowance for startup overhead in their original statement of the dynamic optimality conjecture [61]. As we shall show, their optimism was well-placed: if Splay is eventually optimal then it is dynamically optimal. Our proof bounds startup overhead by averaging execution cost over many repetitions of a request sequence. A repeater $\mathcal{F}$ for algorithm $\mathcal{A}$ is a mapping from integer-instance pairs to request sequences for which there exists positive constants $a$ and $c$ such that $k\operatorname{cost}_{\mathcal{A}}(X,T)\leq a\operatorname{cost}_{\mathcal{A}}(\mathcal{F}(k,X,T),T)$ and $\operatorname{OPT}(\mathcal{F}(k,X,T),T)\leq ck\operatorname{OPT}(X,T)$ for all $k>0$ , request sequences $X$ , and starting trees $T$ . If a repeater exists, then $\mathcal{A}$ is repeatable.

Theorem 4.1.

Eventually optimal repeatable binary search tree algorithms are dynamically optimal.

Proof.

Repeatability and eventual optimality imply positive constants $a$ , $b$ and $c$ and startup overhead $g$ such that $k\operatorname{cost}_{\mathcal{A}}(X,T)\leq a\operatorname{cost}_{\mathcal{A}}(\mathcal{F}(k,X,T),T)\leq ab\operatorname{OPT}(\mathcal{F}(k,X,T),T)+ag(T)\leq(abc)k\operatorname{OPT}(X,T)+ag(T)$ for all request sequences $X$ , starting trees $T$ and $k>0$ . Choose $k\geq g(T)/\operatorname{OPT}(X,T)$ to absorb the overhead and obtain $\operatorname{cost}_{\mathcal{A}}(X,T)\leq a(bc+1)\operatorname{OPT}(X,T)$ . ∎

Theorem 4.2.

Eventually monotone repeatable coercible algorithms are dynamically optimal.

Proof.

These properties imply there exists a simulation embedding $\mathcal{S}$ , constant $b$ and startup overhead $g$ for which $\operatorname{cost}_{\mathcal{A}}(X,T)\leq b\operatorname{cost}_{\mathcal{A}}(\mathcal{S}(E),T)+g(T)\leq b\operatorname{OPT}(X,T)+g(T)$ for all instances $(X,T)$ and corresponding optimal executions $E$ . Apply Theorem 4.1. ∎

Theorem 4.3.

If Splay is eventually monotone then it is dynamically optimal.

Proof.

We show Splay is repeatable. Let $X$ be a request sequence with initial tree $T$ . Let $V$ be the final tree in Splay’s execution of $X$ starting from $T$ , and define the extended sequence $U=X\oplus W$ , where $W$ is the sequence described in Theorem 3.4 that induces Splay to transform $V$ into $T$ . (If $|T|<4$ or $V=T$ then $W=\varnothing$ .) Denote by $k*U$ the sequence $U$ repeated $k$ times. Since $U$ merely consists of requests appended to $X$ , $\operatorname{cost}(X,T)\leq\operatorname{cost}(U,T)$ . The final tree in Splay’s execution of $U$ starting from $T$ is again $T$ , so each repetition has identical after-trees, and $\operatorname{cost}(k*U,T)=k\operatorname{cost}(U,T)$ . Thus, $k\operatorname{cost}(X,T)\leq\operatorname{cost}(k*U,T)$ .

It remains to bound the optimal cost. If $X\neq\varnothing$ let $A=(T^{\prime})\oplus B$ where $T^{\prime}$ is the after-tree for the first request in some optimal execution $E$ for $X$ starting from $T$ and $B$ is the sequence of transition trees in $E$ for the remaining requests in $X$ , otherwise let $B=\varnothing$ . Similarly, if $W\neq\varnothing$ let $C=(V^{\prime})\oplus D$ where $V^{\prime}$ is the first after-tree in Splay’s execution $F$ for $W$ starting from $V$ and $D$ is the sequence of transition trees in $F$ for the remaining requests in $W$ , otherwise let $C=\varnothing$ . The sequence of transition trees $G=A\oplus C$ is an execution of $U$ starting from $T$ . The transition trees in $A$ have total size at most $|T|+\operatorname{OPT}(X,T)$ , and by Theorem 3.4 the transition trees in $C$ have total size at most $209|T|$ . Since we can absorb initial tree size into optimal cost, $\operatorname{OPT}(U,T)\leq 211\operatorname{OPT}(X,T)$ . Finally, $k*G$ is an execution for $k*U$ starting from $T$ , meaning $\operatorname{OPT}(k*U,T)\leq 211k\operatorname{OPT}(X,T)$ , and $\mathcal{F}(k,X,T)=k*U$ is a repeater for Splay. Apply Theorems 4.2 and 3.5. ∎

Our SODA paper established the contrapositive of Theorem 4.3 by repeating hypothetical instances on which Splay is non-optimal in order to contradict any presumed nontrivial startup overhead [46]. Kurt Mehlhorn kindly supplied us with an outline of the above version of the proof after he reviewed our manuscript.

5 Mutation

Monotonicity has no clear analog for algorithms that can handle requests to add and remove keys from the tree. We work around this obstacle by representing these operations using executions of instances that lack such requests. A reduction $\mathcal{R}$ from algorithm $\mathcal{A}$ in execution model $\mathcal{M}$ to algorithm $\mathcal{A}^{\prime}$ in execution model $\mathcal{M}^{\prime}$ is a map from instances in $\mathcal{M}$ to instances in $\mathcal{M}^{\prime}$ for which there exists positive constants $a$ and $c$ such that $\operatorname{cost}_{\mathcal{A}}(\mathcal{I})\leq a\operatorname{cost}_{\mathcal{A}^{\prime}}(\mathcal{R}(\mathcal{I}))$ and $\operatorname{OPT}_{\mathcal{M}^{\prime}}(\mathcal{R}(\mathcal{I}))\leq c\operatorname{OPT}_{\mathcal{M}}(\mathcal{I})$ for all $\mathcal{I}\in\mathcal{M}$ . If such a reduction exists we say $\mathcal{A}$ reduces to $\mathcal{A}^{\prime}$ .

Theorem 5.1.

If $\mathcal{A}$ reduces to $\mathcal{A}^{\prime}$ and $\mathcal{A}^{\prime}$ is dynamically optimal then $\mathcal{A}$ is dynamically optimal.

Proof.

The reduction to an optimal algorithm $\mathcal{A}^{\prime}$ implies the existence of constants $a$ , $b$ and $c$ such that $\operatorname{cost}_{\mathcal{A}}(\mathcal{I})\leq a\operatorname{cost}_{\mathcal{A}^{\prime}}(\mathcal{R}(\mathcal{I}))\leq ab\operatorname{OPT}_{\mathcal{M}^{\prime}}(\mathcal{R}(\mathcal{I}))\leq abc\operatorname{OPT}_{\mathcal{M}}(\mathcal{I})$ for all $\mathcal{I}\in\mathcal{M}$ . ∎

Our reduction employs the following terminology. To augment a binary search tree $T$ with a new key $k$ , first do a search for $k$ in $T$ . When the search reaches a missing node, replace this node with a new node containing the key $k$ . Augmenting an empty tree makes $k$ the root key. (This process is sometimes called “leaf insertion.”) The successor of $k$ in $T$ is the smallest key in $T$ that is greater than $k$ . If no such key is present the successor is $\mathtt{null}$ . The predecessor is defined symmetrically. The predecessor and successor are the neighbors of $k$ in $T$ , and the neighborhood of $k$ is the set comprising $k$ and those of its neighbors that are not missing. If $k$ has no children then its removal from $T$ is the rooted hull comprising every key in $T$ except for $k$ , unless $k$ is the root, in which case its removal forms the empty tree.

A mutating instance comprises a request sequence $\chi=((r_{1},x_{1}),\dots,(r_{m},x_{m}))$ and an initial tree $T$ where each requested operation $r_{i}\in\{\mathtt{search},\mathtt{insert},\mathtt{delete}\}$ . An execution $E$ of this instance comprises a sequence of rooted hulls $Q_{1},\dots,Q_{m}$ , transition trees $Q^{\prime}_{1},\dots,Q^{\prime}_{m}$ , after-trees $T_{1},\dots,T_{m}$ and $T_{0}=T$ . If $r_{i}=\mathtt{search}$ then $x_{i}$ must be in $T_{i-1}$ and $Q_{i}$ , $Q^{\prime}_{i}$ and $T_{i}$ obey the same restrictions as instances without mutation. For $1\leq i\leq m$ , if $r_{i}=\mathtt{insert}$ then $x_{i}$ must not be in $T_{i-1}$ and $Q_{i}$ , $Q^{\prime}_{i}$ and $T_{i}$ fulfill a request to search for $x_{i}$ in the augmentation of $T_{i-1}$ with $x_{i}$ . If $r_{i}=\mathtt{delete}$ then $x_{i}$ must be in $T_{i-1}$ , $Q_{i}$ contains the neighborhood of $x_{i}$ in $T_{i-1}$ , $Q^{\prime}_{i}$ contains the neighbors of $x_{i}$ in $T_{i-1}$ as a rooted hull of its left spine so long as at least one neighbor is not missing, and $T_{i}$ results from substituting $Q^{\prime}_{i}$ for $Q_{i}$ in $T_{i-1}$ and then removing $x_{i}$ . The cost of $E$ is $\sum_{i=1}^{m}|Q^{\prime}_{i}|$ . We denote by $\operatorname{OPT}_{\operatorname{mut}}(\chi,T)$ the minimum cost among executions for $\chi$ starting from $T$ .

Requiring that executions incorporate both of a deleted key’s neighbors, when they are present, is essential to our analysis. We are unable to determine if algorithms that are dynamically optimal among executions in our model of deletion remain so after removing this requirement. Having stated this caveat, our nonstandard version of deletion does not change any known upper bound on the optimum cost of executing a mutating instance, that we are aware of, by more than a constant factor. We believe our model is sufficiently realistic to proceed without further concern.

Our extension of Splay inserts by augmenting $T$ with $x$ followed by splaying $x$ and deletes $x$ from $T$ by successively splaying the keys in the neighborhood of $x$ in $T$ in increasing order, rotating at the predecessor of $x$ if it is present, and then removing $x$ . Splay’s transition tree for the deletion is the rooted hull comprising the union of keys on the access paths of these operations in the tree immediately prior to the removal of $x$ .

Theorem 5.2.

If Splay is eventually monotone for instances without mutation then it is dynamically optimal for instances with mutation.

Proof.

We reduce Splay with mutation to Splay without mutation. Let $\chi=((r_{1},x_{1}),\dots,(r_{m},x_{m}))$ and $T$ be the request sequence and starting tree of a mutating instance. Construct a new instance without mutation, as follows. Let $K_{0}$ be the set of keys in $T$ and form $S_{0}$ by relabelling the keys in $T$ to their respective indices. All nodes in $S_{0}$ are unmarked. (A node’s marking status merely aids in our construction and has no effect on algorithmic behavior.) For $1\leq i\leq m$ , let $u_{i}$ be the key whose index among the set of keys held by unmarked nodes in $S_{i-1}$ is the same as the index of the predecessor of $x_{i}$ in $K_{i-1}$ if a predecessor is present, otherwise set $u_{i}=-\infty$ . Define $w_{i}$ analogously for the successor of $x_{i}$ if a successor is present, otherwise set $w_{i}=\infty$ .

If $r_{i}=\mathtt{search}$ then let $z_{i}$ be the key whose index among those held by unmarked nodes in $V_{i-1}$ is the same as the index of $x_{i}$ in $K_{i-1}$ , let $S_{i}=S_{i-1}$ and set $Y_{i}=(z_{i})$ and $K_{i}=K_{i-1}$ .

If $r_{i}=\mathtt{insert}$ then do the following. If $S_{i-1}$ has marked nodes with keys strictly between $u_{i}$ and $w_{i}$ in symmetric order then set $z_{i}$ to the most recently marked among them and form $S_{i}$ by unmarking $z_{i}$ in $S_{i-1}$ . Otherwise, form $S_{i}$ by augmenting $S_{i-1}$ with a new unmarked node whose key $z_{i}$ is as follows. If neither $u_{i}$ nor $w_{i}$ are finite then $z_{i}=0$ ; if only $u_{i}$ is finite then $z_{i}=u_{i}-1$ ; if only $w_{i}$ is finite then $z_{i}=w_{i}+1$ ; otherwise, $z_{i}$ is the midpoint between $u_{i}$ and $w_{i}$ on the real line. Set $Y_{i}=(z_{i})$ and $K_{i}=\{x_{i}\}\cup K_{i-1}$ .

If $r_{i}=\mathtt{delete}$ then define $z_{i}$ as in the case for search. If at least one of $u_{i}$ and $w_{i}$ is non-finite then form $S_{i}$ by marking $z_{i}$ in $S_{i-1}$ . If neither $u_{i}$ nor $w_{i}$ is finite set $Y_{i}=(z_{i})$ ; if only $u_{i}$ is finite set $Y_{i}=(u_{i},z_{i},u_{i})$ ; otherwise, if only $w_{i}$ is finite set $Y_{i}=(z_{i},w_{i})$ . When both $u_{i}$ and $w_{i}$ are finite, proceed as follows. Define $v_{i}$ to be the key in the most recently marked among the marked nodes of $S_{i-1}$ with keys strictly between $z_{i}$ and $w_{i}$ if such a node is present, otherwise $v_{i}$ is the midpoint between $z_{i}$ and $w_{i}$ on the real line. If $v_{i}\in S_{i-1}$ then $S_{i}$ is as in the case when at least one of $u_{i}$ and $w_{i}$ is finite, otherwise form $S_{i}$ by augmenting $S_{i-1}$ with an unmarked node holding key $v_{i}$ and successively marking $v_{i}$ and then $z_{i}$ . Set $Y_{i}=(u_{i},z_{i},w_{i},v_{i},w_{i},u_{i},w_{i})$ . Splaying the first five of these requests ensures the keys $\{u_{i},z_{i},v_{i},w_{i}\}$ comprise a rooted hull of the left spine, and splaying the final two requests induces successive rotations at $z_{i}$ and $u_{i}$ . Finally, set $K_{i}=K_{i-1}\setminus\{x_{i}\}$ .

Define $\mathcal{R}(\chi,T)=(Y_{1}\oplus\cdots\oplus Y_{m},S_{m})$ . Figure 4 depicts an example of this process. Let $P^{\prime}_{1},\dots,P^{\prime}_{m}$ and $L_{1},\dots,L_{m}$ be the transition trees and after-trees in Splay’s execution of $\chi$ starting from $T$ . Set $B_{0}=S_{m}$ and for $1\leq i\leq m$ let $A_{i}$ be the union of keys in the transition trees of Splay’s execution of $Y_{i}$ starting from from $B_{i-1}$ and let $B_{i}$ be the final tree of this execution. If $S_{i}$ has unmarked nodes then their keys comprise a rooted hull in $B_{i}$ which is isomorphic to $L_{i}$ , and $P^{\prime}_{i}$ is isomorphic to a subgraph of the rooted hull in $B_{i}$ comprising the keys in $A_{i}$ . Thus, $\operatorname{cost}(X,S)\leq\operatorname{cost}\mathcal{R}(\chi,S)$ .

It remains to bound the cost of an optimal execution for the new instance. Let $Q_{1},\dots,Q_{m}$ be the rooted hulls and $Q^{\prime}_{1},\dots,Q^{\prime}_{m}$ be the transition trees for an optimal execution of $\chi$ starting from $T$ , and let $G_{0}=S_{m}$ . For $1\leq i\leq m$ , if $r_{i}\neq\mathtt{delete}$ or $\infty\in\{-u_{i},w_{i}\}$ then let $C_{i}$ be the rooted hull of $G_{i-1}$ that is isomorphic to $Q_{i}$ and let $C^{\prime}_{i}$ be the transition tree with the same keys as $C_{i}$ that is isomorphic to $Q^{\prime}_{i}$ . If $r_{i}=\mathtt{delete}$ and $\infty\notin\{-u_{i},w_{i}\}$ then form $C_{i}$ and $C^{\prime}_{i}$ by respectively augmenting said isomorphic rooted hull and corresponding transition tree with $v_{i}$ . The first in the transition tree sequence $D^{\prime}_{i}$ for the requests in $Y_{i}$ is the tree that results from splaying the first requested key of $Y_{i}$ in $C^{\prime}_{i}$ . The remaining transition trees in $D^{\prime}_{i}$ are the transition trees of Splay’s execution of the remaining requests in $Y_{i}$ starting from the first tree in $D^{\prime}_{i}$ . Finally, let $G_{i}$ be the final after-tree in the execution of $Y_{i}$ starting from $G_{i-1}$ whose transition trees are $D^{\prime}_{i}$ . The first tree in $D^{\prime}_{i}$ has size at most $|Q^{\prime}_{i}|+1$ , and $Y_{i}$ has at most six remaining requests, each served by a transition tree from $D^{\prime}_{i}$ with size at most four, meaning $\sum_{H\in D^{\prime}_{i}}|H|\leq 26|Q^{\prime}_{i}|$ . The transition tree sequence $\bigoplus^{m}_{i=1}D^{\prime}_{i}$ executes $\mathcal{R}(\chi,T)$ . Thus, $\operatorname{OPT}\mathcal{R}(\chi,T)\leq 26\operatorname{OPT}_{\operatorname{mut}}(\chi,T)$ , and $\mathcal{R}$ is a reduction. Apply Theorems 5.1 and 4.3. ∎

Theorem 5.2 has an interesting consequence. A deque instance is an initial tree together with a request sequence entirely comprising $\mathtt{push}$ , $\mathtt{pop}$ , $\mathtt{inject}$ and $\mathtt{eject}$ operations which respectively correspond to inserting a new maximum, deleting the maximum, inserting a new minimum and deleting the minimum. A binary search tree algorithm $\mathcal{A}$ supports deque operations if there is some $c>0$ so that $\operatorname{cost}_{\mathcal{A}}(D,T)\leq c(|D|+|T|)$ for all deque instances $(D,T)$ . The deque conjecture states that Splay supports deque operations. Tarjan proved that Splay supports a limited subset of deque operations [65]. Sundar placed an inverse-Ackermann upper bound on the cost of performing general deque operations [63]. Pettie later tightened this bound [55]. Similar bounds are known for Greedy [11]. We add a new result:

Theorem 5.3.

If Splay is eventually monotone without mutation then it supports deque operations.

Proof.

Let $D=(r_{1},\dots,r_{m})$ and $T$ be the request sequence and initial tree for a deque instance, and assume without loss of generality that the starting tree’s keys are integers. We construct after-trees $T_{1},\dots,T_{m}$ for an execution of this instance, as follows. If $r_{i}\in\{\mathtt{inject},\mathtt{eject}\}$ then the root of $T_{i}$ is its minimum, assuming $T_{i}\neq\varnothing$ . Otherwise, if $r_{i}\in\{\mathtt{push},\mathtt{pop}\}$ then the root of $T_{i}$ is its maximum, and if $r_{i}=\mathtt{pop}$ and $\min T_{i}\neq\max T_{i}$ then the root’s left child is the minimum key in the tree. The remaining keys $K_{i}$ in $T_{i}$ are in a subtree at the appropriate location in symmetric order with the following structure. The root $v_{i}$ of $K_{i}$ is the largest key less than or equal to the median key in $K_{i}$ , assuming $K_{i}\neq\varnothing$ . The left and right subtrees of $v_{i}$ are respectively right and left spines comprising the keys in $K_{i}\setminus\{v_{i}\}$ that are less than and greater than $v_{i}$ . The initial transition tree of this execution has at most $|T|+1$ nodes and the remainder of the execution can be realized using transition trees each of size at most eight. Thus, $\operatorname{OPT}_{\operatorname{mut}}(D,T)\leq 8(|D|+|T|)$ . Apply Theorem 5.2. ∎

There are other ways to implement insertion and deletion. Sleator and Tarjan analyze an extension of Splay which supports mutation operations by using splits and joins [61]. Tarjan implements $\mathtt{push}$ and $\mathtt{inject}$ by inserting at the top of the tree [65]. Zip trees insert and delete keys starting from the middle of the tree and rearrange descendants to restore the binary search tree invariants [66]. Common implementations of deletion in computers replace the deleted node with the node holding the predecessor or successor of the removed key, a technique originally devised by Hibbard [36]. Our version of deletion originates from Cole’s analysis of the “dynamic finger” theorem [18]. Its main advantage is enabling our proof of Theorem 5.2. We leave as open problems determining whether Tarjan’s implementation of deque operations or algorithms that use Hibbard’s variant of deletion are reducible to algorithms in our model of mutation.

6 Natural Algorithms

Our results readily generalize. We say an algorithm is natural if the rooted hulls of its executions are always the access paths for the requested keys and the transition trees for isomorphic rooted hulls are isomorphic. Splay is a natural algorithm. We incorporate mutation in the same way as for Splay. A natural algorithm inserts by searching in the augmented tree and deletes by successively searching in increasing order for all in the neighborhood of the deleted key, then rotating at the predecessor if present, followed by removing the key. To construct the transition digraph $\mathcal{G}_{n}(\mathcal{A})$ for natural algorithm $\mathcal{A}$ , assign a vertex to every binary search tree with keys $\{1,\dots,n\}$ , and for every $T\in\mathcal{G}_{n}(\mathcal{A})$ and $x\in T$ add an arc from $T$ to the result of executing a search for $x$ in $T$ with $\mathcal{A}$ .

Theorem 6.1.

A natural algorithm whose transition digraph is strongly connected for binary search trees of some size at least three is dynamically optimal for instances with mutation if and only if it is eventually monotone for instances without mutation.

Proof.

Let $\mathcal{A}$ be a natural algorithm and let $N$ be the smallest integer greater than two for which $\mathcal{G}_{N}(\mathcal{A})$ is strongly connected. Choose a map $P$ from each pair $q,q^{\prime}\in\mathcal{G}_{N}(\mathcal{A})$ to some request sequence whose execution by $\mathcal{A}$ starting from $q$ has a sequence of after-trees which, when prefixed by $q$ , comprises a directed path of minimal length connecting $q$ to $q^{\prime}$ in $\mathcal{G}_{N}(\mathcal{A})$ . Let $T^{\prime}$ be the after-tree of transforming $T$ with rooted hull $Q$ and transition tree $Q^{\prime}$ . If $Q=Q^{\prime}$ or $|T|<N$ then define $C_{T}(Q,Q^{\prime})=\operatorname{root}(Q^{\prime})$ . Otherwise, choose rooted hulls $q_{1},\dots,q_{k}$ and transition trees $q^{\prime}_{1},\dots,q^{\prime}_{k}$ for enacting restricted rotations in an identical manner to Theorem 3.4, except that each transition tree has size $N$ , rather than size four, and set $C_{T}(Q,Q^{\prime})=\bigoplus_{i=1}^{k}P(q_{i},q^{\prime}_{i})$ . The output of $P$ contains at most one request per vertex in $\mathcal{G}_{N}(\mathcal{A})$ . There are $(2N)!/(N!(N+1)!)$ binary trees with keys $\{1,\dots,N\}$ [57]. (This is the Catalan number for $N$ .) Each access path contains at most every node in the tree. Thus, the cost for $\mathcal{A}$ to execute $C_{T}(Q,Q^{\prime})$ starting from $T$ is at most $4(2N)!/((N+1)!(N-1)!)|Q^{\prime}|$ , which is linear in the transition tree’s size. This execution’s final tree is $T^{\prime}$ whenever $|T|\geq N$ .

The map $\mathcal{S}(E)=\bigoplus_{i=1}^{m}C_{T}(Q_{i},Q^{\prime}_{i})$ , where $Q_{1},\dots,Q_{m}$ and $Q^{\prime}_{1},\dots,Q^{\prime}_{m}$ are the rooted hulls and transition trees of some execution $E$ for $X$ starting from $T$ , is a simulation embedding for $\mathcal{A}$ . Similarly, $\mathcal{F}(k,X,T)=k*(X\oplus C_{T}(T,V))$ , where $V$ is the final tree in the execution of $X$ starting from $T$ by $\mathcal{A}$ , is a repeater for $\mathcal{A}$ . Thus, by Theorem 4.2, if $\mathcal{A}$ is eventually monotone then it is dynamically optimal for instances without mutation. To reduce how $\mathcal{A}$ executes mutation requests to its behavior when executing non-mutating instances, modify how the reduction in Theorem 5.2 handles requests to delete keys with both predecessors and successors. Instead of requesting a single auxiliary key, as in the case for Splay, add requests for $N-3$ auxiliary keys, augmenting the initial tree with unmarked nodes as necessary. Then, replace the requests that induce Splay to perform the relevant rotations along the left spine with the output of $P$ . Necessity of eventual monotonicity follows from Theorem 3.1. ∎

A strongly connected transition digraph is not necessary for a natural algorithm to be coercible. An example of this is a variant of Splay that carries out restructuring operations in tandem with the binary search for the requested key, eliminating the need for parent pointers, call stacks or threaded nodes [61]. A top-down-splay operation for $x$ in $T$ begins by initializing a pointer $l$ to a childless node whose key is $-\infty$ , a pointer $r$ to a childless node with key $\infty$ , and a pointer $t$ to the root of $T$ . It uses left and right linking steps. Linking left replaces the left subtree of $r$ with $t$ , redirects $r$ to point to the target of $t$ , and redirects $t$ to point to its target’s left child. Linking right is symmetric. The operation repeats the following process until $t=x$ . Suppose without loss of generality that $x$ is in the subtree rooted at the left child $y$ of $t$ . (The other case is symmetric.) If $y=x$ then execute a right link. (This case is terminal.) Otherwise, if $x<y$ then rotate at $y$ , redirect $t$ to point to $y$ , and execute a right linking operation. Otherwise, execute a right link followed by a left link operation. Once $t=x$ , the operation completes by replacing the right subtree of $l$ with the left subtree of $t$ followed by replacing the left subtree of $t$ with the right subtree of $-\infty$ , and doing symmetrically with $r$ , the left subtree of $\infty$ and the right subtree of $t$ . Mäkinen compares Splay to its top-down variant in detail [51].

Theorem 6.2.

Top-Down Splay’s transition digraph is not strongly connected for binary search trees of any size greater than two.

Proof.

Let $S$ be a binary search tree of size at least three whose root $a=\min S$ has right child $z=\max S$ , let $Q$ be the left subtree of $z$ in $S$ and let $T=\operatorname{top-down-splay}(T,x)$ for some $x\in Q$ . We show no request sequence induces Top-Down Splay to restore $T$ to $S$ . Suppose, for the sake of contradiction, that such a sequence exists. Because $\operatorname{root}(S)=a$ , the last key requested in any such sequence must be $a$ . Since $\operatorname{top-down-splay}(T,a)\neq S$ , there must be at least one preceding request for a different key. Let $y\neq a$ be penultimate key in this sequence and let $R$ be after-tree corresponding to this request, so that $\operatorname{root}(R)=y$ . We demonstrate $\operatorname{top-down-splay}(R,a)\neq S$ .

Suppose first that $y\in Q$ . Because $a<y<z$ , the left and right subtrees of $y$ in $R$ respectively contain $a$ and $z$ . Thus, $z$ is not on the access path for $a$ in $R$ . Because $y$ is the largest key on the access path to $a$ in $R$ and Top-Down Splay is a natural algorithm, the access path to $y$ in $\operatorname{top-down-splay}(R,a)$ is a right spine rooted at $a$ , and $y$ is an ancestor of $z$ in this after-tree. This is incompatible with $z$ being the right child $a$ , as is the case in $S$ . Thus, $y=z$ . Since $a$ is the smallest key, it is on the left spine. Direct computation shows $\operatorname{top-down-splay}(R,a)\neq S$ when $a$ has depth three or four, and a simple induction establishes the same for a left spine of any greater length. Hence, there is no path from $T$ to $S$ in Top-Down Splay’s transition digraph, and this transition digraph is not strongly connected. ∎

Theorem 6.3.

If Top-Down Splay is approximately monotone then it is dynamically optimal.

Proof.

Let $T$ be a nonempty binary search tree, let $a=\min T$ , let $z=\max T$ , and let $b$ be the successor of $a$ in $T$ if the successor is present, otherwise $b=a$ . Define the map $H$ that transforms any binary search tree $S$ with the same keys as $T$ in the following way. Form $U$ by replacing the left subtree of the parent of $a$ in $S$ with the right subtree of $a$ in $S$ if the parent is present, otherwise set $U=S$ . Form $V$ from $U$ by doing similar with $b$ , and form $W$ from $V$ by replacing the right subtree of the parent of $z$ with the left subtree of $z$ in $V$ , otherwise $W=V$ . The tree $H(S)$ is the binary search tree whose left spine comprises $\{a,b,z\}$ such that $W$ is the right subtree of $b$ in $H(S)$ .

Let $E$ be an execution for $X=(x_{1},\dots,x_{m})$ starting from $T$ with rooted hulls $Q_{1},\dots,Q_{m}$ , transition trees $Q^{\prime}_{1},\dots,Q^{\prime}_{m}$ and after-trees $T_{1},\dots,T_{m}$ and let $T_{0}=T$ . Let $A=(z,b,a,z)$ , let $\hat{T}_{-1}$ be the final tree in Top-Down Splay’s execution of let $A$ starting from $\hat{T}_{-1}$ , let $Q_{0}=\hat{T}_{-1}$ , let $Q^{\prime}_{0}=T$ , let $T_{0}=T$ and let $x_{0}=z$ . For $0\leq i\leq m$ let $\hat{T}_{i}=H(T_{i})$ , let $\hat{Q}_{i}$ and $\hat{Q}^{\prime}_{i}$ respectively be the smallest rooted hulls in $\hat{T}_{i-1}$ and $\hat{T}_{i}$ containing the keys in $Q_{i}\cup\{a\}$ , and let $Y_{i}$ be the sequence of keys determined by the restricted rotation algorithm for transforming the right subtree of $b$ in $\hat{Q}_{i}$ into its right subtree in $\hat{Q}^{\prime}_{i}$ . Form $Z_{i}$ by replacing each key in $Y_{i}$ with the corresponding keys determined by Figure 5, and then appending the requests $(x_{i},a,z)$ . Top-Down Splay’s execution of $A$ has cost proportional to at most $|T|$ , which can be absorbed into the cost of $E$ . The sequence $Z_{i}$ induces Top-Down Splay to transform $\hat{Q}_{i}$ into $\hat{Q}^{\prime}_{i}$ . Since $|\hat{Q}^{\prime}_{i}|\leq|Q^{\prime}_{i}|+3$ , the transformation’s cost is proportional to $|Q^{\prime}_{i}|$ . Thus, the map $\mathcal{S}(E)=A\oplus\bigoplus_{i=0}^{m}Z_{i}$ is a simulation embedding for Top-Down Splay. Apply Theorem 3.2. ∎

While the rooted hulls of Greedy’s executions are access paths, its transition trees are determined by which keys are in surrounding requests, meaning Greedy is not a natural algorithm. Lucas conjectured that restricting executions’ rooted hulls to the access path does not increase optimal cost by more than a fixed multiple [49]. This conjecture remains open. Kozma catalogues related open questions about the relative power of classes of binary search tree algorithms subject to various restrictions [42]. Splay is in the most restrictive of these classes, meaning dynamic optimality would imply they are all equivalent up to constant factors.

7 Crossing Cost

Binary search trees facilitate efficient search by arranging keys into many short access paths comprising children of alternating direction, and an algorithm’s efficiency depends critically on how it utilizes these arrangements. Consider the subtree $P$ of a binary search tree $T$ comprising the access path for a node $x$ in $T$ . The crossing nodes for $x$ in $T$ comprise $x$ , the root of $T$ , and the nodes in $P$ that are either left children with a right child on $P$ or right children with a left child on $P$ . We refer to the number of crossing nodes for $x$ as its crossing depth in $T$ , denoted $\ell_{T}(x)$ . The bookkeeping nodes are the non-crossing nodes on $P$ . The crossing cost of execution $E$ with after-trees $T_{1},\dots,T_{m}$ for $X=(x_{1},\dots,x_{m})$ starting from $T$ is $\sum_{i=1}^{m}\ell_{T_{i-1}}(x_{i})$ where $T_{0}=T$ , and the execution’s bookkeeping cost is $\sum_{i=1}^{m}(d_{T_{i-1}}(x_{i})-\ell_{T_{i-1}}(x_{i}))$ .

To perform a move-to-root operation, repeatedly rotate at the requested key until it becomes the root. The Move-to-Root algorithm enacts this process at each request. The crossing bound for $X$ starting from $T$ , denoted $\Lambda(X,T)$ , is the crossing cost of Move-to-Root’s execution for this instance. The crossing bound is essentially equivalent to Wilber’s second lower bound on optimal execution cost [67]. Thus, the crossing bound never exceeds a fixed multiple of optimal execution cost. (See Appendix B.) We shall prove that the crossing bound is approximately monotone. Our techniques preview those required to show the same for Splay. We begin with three properties of Move-to-Root. The first describes the structure of its transition trees.

Theorem 7.1.

Executing $\operatorname{move-to-root}(T,x)$ transforms the access path to $x$ in $T$ into a tree whose root $x$ has left and right subtrees that are respectively right and left spines.

Proof.

By induction on the number of rotations involved in the operation. If the requested key lies at the root then the statement is trivial. Now suppose that the statement is true for nodes of depth $k$ , let $d_{T}(x)=k+1$ and $z=\operatorname{root}(T)$ , and assume without loss of generality that $T$ solely comprises keys on the access path for $x$ in $T$ and that $x<z$ . (The other case is symmetric.) The first $k-1$ of the $k$ successive rotations at $x$ performed while executing $\operatorname{move-to-root}(T,x)$ replace the left subtree $Q$ of $z$ in $T$ with $Q^{\prime}=\operatorname{move-to-root}(Q,x)$ . By the inductive hypothesis, the left and right subtrees of $x$ in $Q^{\prime}$ comprise inward facing spines of the keys in $T\setminus\{x,z\}$ that are respectively less than and greater than $x$ . The left subtree of $x$ remains unchanged after the final rotation, while the right subtree of $x$ immediately before the final rotation becomes the left subtree of $z$ immediately afterward, and the right subtree of $x$ after the final rotation is a left spine of keys greater than $x$ . Thus, the hypothesis holds for nodes at depth $k+1$ . ∎

The second property demonstrates Move-to-Root’s executions reflect temporal patterns in the request sequence. A binary tree is max-heap ordered if each node is assigned a priority from a totally ordered set and every non-root node’s priority is at most that of its parent’s. The standard priorities for a request sequence $X=(x_{1},\dots,x_{m})$ starting from $T$ are the mappings $p_{0},p_{1},\dots,p_{m}$ from keys to priorities such that, for $y\in T$ , $p_{0}(y)=\tau(y)-|T|-1$ where $\tau(y)$ is the time at which $y$ is requested in $\operatorname{postorder}(T)$ , and $p_{i}(y)=p_{i-1}(y)$ if $y\neq x_{i}$ and $p_{i}(x_{i})=i$ for $1\leq i\leq m$ .

Theorem 7.2.

Move-to-Root’s after-trees are max-heap ordered by the instance’s standard priorities.

Proof.

The initial tree is max-heap ordered with respect to the initial priorities: the root of the initial tree has highest priority, and the same holds recursively for its subtrees. We can see as follows that Move-to-Root restores the max-heap order invariant after each-request. Resetting the priority of the node holding requested key $x$ introduces a single heap order violation at the edge between $x$ and its parent, if the parent is present. After each rotation at $x$ that does not result in $x$ becoming the root, only a single edge in the tree violates the heap order, and that edge is always the one between $x$ and its parent. When $x$ becomes the root, it has the largest priority, and no other edges violate the heap order. ∎

The third property characterizes how Move-to-Root arranges keys in its after-trees. The left and right window boundaries $u$ and $v$ for a given key $y$ determined by a request sequence $X$ are respectively the largest key less than or equal to $y$ and the smallest key greater than or equal to $y$ in $X\oplus(-\infty,\infty)$ . The window subtree $J$ for $y$ determined by an execution of $X$ starting from $T$ with final tree $R$ is as follows. If neither $u$ nor $v$ are finite then $J=T$ ; if $u=v$ then $J=\varnothing$ ; if only $u$ is finite, or if both $u$ and $v$ are finite and the final request for $u$ precedes the final request for $v$ in $X$ , then $J$ is the right subtree of $u$ in $R$ ; otherwise, $J$ is the left subtree of $v$ in $R$ .

Theorem 7.3.

The window subtree for $y$ determined by Move-to-Root’s execution of $X$ starting from $T$ comprises the keys in $T$ strictly between the window boundaries for $y$ determined by $X$ .

Proof.

By induction on the number of requests. The initial tree and final tree are identical for executions of the empty sequence, so the statement is true when there are no requests. Now suppose the statement is true for sequences of up to $|W|$ requests and that $X=W\oplus(z)$ for some $z\in T$ , and assume without loss of generality that $y\notin W$ . Let $u$ and $v$ be the window boundaries for $y$ determined by $W$ , let $I$ be the set of keys in $T$ that are larger than $u$ and smaller than $v$ , let $R$ be the final tree in Move-to-Root’s execution of $W$ starting from $T$ , and let $J$ be the window subtree for $y$ determined by this execution. Define $u^{\prime}$ , $v^{\prime}$ , $I^{\prime}$ , $R^{\prime}$ and $J^{\prime}$ analogously for $X$ .

Consider first when $z\notin I$ so that $I^{\prime}=I$ and there are no rotations at any key in $J$ while moving $z$ to the root, and assume without loss of generality that $J$ is the right subtree of $u$ in $R$ . (The case when $J$ is the left subtree of $v$ is symmetric.) By the inductive hypothesis, $I$ is the set of keys in $J$ . If $z\neq u$ then the right subtree of $u$ in $R^{\prime}$ is the same as in $R$ since there are no rotations at $u$ while moving $z$ to the root of $R$ , and $J^{\prime}$ is the right subtree of $u$ in $R^{\prime}$ since $v$ is either infinite or more recently requested in $X$ than $u$ . If $z=u$ and $v$ is infinite then every key greater than $u$ is in its right subtree in $R$ and $u$ is on the right spine of $R$ , meaning the right subtree of $u$ in $R^{\prime}$ is the same as in $R$ and $J^{\prime}$ is again the right subtree of $u$ in $R^{\prime}$ . If $z=u$ and $v$ is finite then $J^{\prime}$ is the left subtree of $v$ in $R^{\prime}$ and $v$ is an ancestor and the successor of $u$ in $R$ , meaning the left subtree of $v$ in $R^{\prime}$ is $J$ since Move-to-Root is a natural algorithm. In all cases $J^{\prime}=J$ and the hypothesis holds for $X$ .

Consider now when $z\in I$ . If $z=y$ then $J^{\prime}=I^{\prime}=\varnothing$ by construction. Otherwise, assume without loss of generality that $z<y$ so that $u^{\prime}=z$ . (The case when $v^{\prime}=z$ is symmetric.) Form $\hat{R}$ by substituting $\operatorname{move-to-root}(J,u^{\prime})$ for $J$ in $R$ and let $\hat{J}$ be the right subtree of $u^{\prime}$ in $\hat{R}$ . Since $I^{\prime}$ and $\hat{J}$ have the same keys and $R^{\prime}=\operatorname{move-to-root}(\hat{R},u^{\prime})$ , we may apply analysis of when $z\notin I$ . ∎

Our proof of approximate monotonicity examines how individual request removals affect the crossing bound. In particular, removing the first request in a sequence subtracts the first key’s crossing depth from the crossing cost of Move-to-Root’s execution of the remaining requests. The remaining requests are now executed starting from the original tree, rather than the tree resulting from moving the first requested key to the root. As the remainder of the altered execution proceeds, its after-trees become progressively similar to those of Move-to-Root’s execution of the original request sequence. The key to our argument is bounding the cost incurred by this restoration process.

Theorem 7.4.

$\Lambda(X,T)-\Lambda(X,\operatorname{move-to-root}(T,y))\leq 4\ell_{T}(y)$ .

Proof.

Let $R^{\prime}$ and $S^{\prime}$ be the final trees in Move-to-Root’s executions of $X$ starting respectively from $T$ and $\operatorname{move-to-root}(T,y)$ , let $J^{\prime}$ and $K^{\prime}$ be the window subtrees for $y$ determined by these executions, and set $k^{\prime}$ to be the crossing depth of $y$ in $J^{\prime}$ if $J^{\prime}$ is nonempty and zero otherwise. We show by induction on the number of requests that $\Lambda(X,T)-\Lambda(X,\operatorname{move-to-root}(T,y))\leq 4(\ell_{T}(y)-k^{\prime})$ . The statement is true by construction for the empty sequence, so consider when $X=Y\oplus(z)$ for some $z\in T$ . Define $R$ , $S$ , $J$ , $K$ and $k$ analogously for $Y$ , let $I$ be the set of keys in $T$ contained in the symmetric order interval strictly between the window boundaries for $y$ determined by $Y$ , and suppose $\Lambda(Y,T)-\Lambda(Y,\operatorname{move-to-root}(T,y))\leq 4(\ell_{T}(y)-k)$ .

Since $\Lambda(X,T)-\Lambda(X,\operatorname{move-to-root}(T,y))=\Lambda(Y,T)-\Lambda(Y,\operatorname{move-to-root}(T,y))+\ell_{R}(z)-\ell_{S}(z)$ it suffices to show that $\ell_{R}(z)-\ell_{S}(z)\leq 4(k-k^{\prime})$ . Thus, we need to characterize the structure of the final trees. The tree $\operatorname{move-to-root}(T,y)$ is max-heap ordered with respect to a priority function that is identical to the standard priority function for starting tree $T$ except at $y$ , whose priority in $\operatorname{move-to-root}(T,y)$ we set by convention to zero. By Theorem 7.2, the after-trees of Move-to-Root’s executions of $X$ starting from $T$ and $\operatorname{move-to-root}(T,y)$ are max-heap ordered by priority functions defined recursively in the standard way starting respectively from the priority functions for $T$ and $\operatorname{move-to-root}(T,y)$ . Requests subsequent to $y$ have the same after-trees in both executions, so we may assume without loss of generality that $y\notin Y$ and $k>0$ .

By Theorem 7.3, when $Y\neq\varnothing$ the keys in $T\setminus I$ comprise a rooted hull in $R$ and $S$ . (If $Y=\varnothing$ then $T\setminus I$ is empty.) These keys have the same priorities in both $R$ and $S$ , meaning the two rooted hulls are identical. The window subtree $J$ has the same keys as $K$ , none of which are requested in $Y$ , meaning these keys have their initial priorities in $R$ and $S$ . The only key with differing priority in $J$ and $K$ is $y$ , which has maximal priority among keys in $K$ . Therefore, $K=\operatorname{move-to-root}(J,y)$ .

If $z\notin I$ then $k^{\prime}=k$ and $\ell_{R}(z)=\ell_{S}(z)$ , so we may assume $z\in I$ without loss of generality. If $J=R$ set $J^{+}=J$ , otherwise set $J^{+}$ to be the subgraph in $R$ comprising the union of keys in $J$ with the window boundary for $y$ determined by $Y$ of which the root of $J$ is a child in $R$ . Define $K^{+}$ analogously for $K$ in $S$ . The access path to $\operatorname{root}(J^{+})$ in $R$ is identical to the access path for $\operatorname{root}(K^{+})$ in $S$ whenever $Y$ is nonempty, and the access paths to this key in $R$ and $S$ are identical. Thus, $\ell_{R}(\operatorname{root}(J^{+}))=\ell_{S}(\operatorname{root}(K^{+}))$ while $\ell_{R}(z)=\ell_{R}(\operatorname{root}(J^{+}))+\ell_{J^{+}}(z)-2$ and $\ell_{S}(z)=\ell_{S}(\operatorname{root}(K^{+}))+\ell_{K^{+}}(z)-2$ , meaning $\ell_{R}(z)-\ell_{S}(z)=\ell_{J^{+}}(z)-\ell_{K^{+}}(z)$ .

Let $P$ be the be the smallest rooted hull in $J$ containing the neighborhood of $z$ in $J$ . Since Move-to-Root is a natural algorithm, the subtrees hanging from $P$ are unaffected by executing $\operatorname{move-to-root}(J,y)$ , and so these subtrees are identically arranged in $K$ . Therefore, we may assume with no loss of generality that either $z\in P$ or that the parent of $z$ in $R$ is defined and lies in $P\setminus\{y\}$ .

Let $w_{1},\dots,w_{k-1}$ be the first $k-1$ crossing nodes for $y$ in $J$ in increasing order of depth and let $w_{0}=y$ . Since $J=K$ and $R=S$ if $y$ is the root of $J$ , we may may assume without loss of generality that $y$ has a parent in $J$ . Thus, let $w_{k}$ be the child of $y$ in the same direction as $y$ with respect to its parent in $J$ and let $w_{k+1}$ be the child of $y$ in the opposite direction. Define $j=\ell_{J}(w_{c})$ where $c$ is the largest integer for which $w_{c}$ is an ancestor of $z$ in $J$ if $z\neq y$ and otherwise $c=0$ . Let $D$ be the indicator for $Y\neq\varnothing$ , let $A$ be the indicator for $z\notin P$ , let $B$ be the indicator for $z\neq w_{c}$ , let $G$ be the indicator for $\ell_{J}(y)<\ell_{J^{+}}(y)$ and let $F$ be the indicator for $\ell_{K}(z)<\ell_{K^{+}}(z)$ . (An indicator’s value is one when its condition is true and zero otherwise.) By Theorem 7.1 and case analysis,

[TABLE]

where $H=A(1-B)(1+D(1-G))+B(1+A+G)+(1-A)(1-B)D$ . (See Figure 6.)

Note that if $c=1$ then $F=G$ , and if $c=2$ then $F=D(1-G)$ , and if $G=1$ or $F=1$ then $D=1$ . Combining these facts with some case analysis reveals that if $1\leq j\leq 2$ then $\ell_{J^{+}}(z)-\ell_{K^{+}}(z)=0$ , and otherwise $\ell_{J^{+}}(z)-\ell_{K^{+}}(z)\leq j$ . If $c=0$ then $k^{\prime}=0$ and $j=k$ . Otherwise, we apply Theorem 7.1 to deduce the structure of the access path $P^{\prime}$ for $y$ in $J^{\prime}$ . If $c=1$ then $P^{\prime}$ is a subgraph of $P$ and $k^{\prime}\leq k$ . If $z$ is in the subtree rooted at $w_{k}$ then $P^{\prime}$ is a right spine ending at $y$ and if $z$ is in the subtree rooted at $w_{k+1}$ then $P^{\prime}$ is a left spine ending at $y$ . Therefore when $z$ is a strict descendant of $y$ , if $k=2$ and $c=3$ then $k^{\prime}=1$ since $w_{k+1}$ and the parent of $y$ in $P$ are on the same side of $y$ in symmetric order, and otherwise $k^{\prime}\leq 2$ and $j\leq k+1$ . Otherwise, $2\leq c\leq k-1$ , in which case if $z\in P$ then set $q$ to be the child of $z$ child in $P$ and otherwise set $q$ to be the parent of $z$ in $P$ . Let $U$ be a right spine comprising the strict ancestors of $q$ in $P$ that are less than $y$ and let $V$ be a left spine of the strict ancestors of $q$ in $P$ that are greater than $y$ , and let $M$ be the path from $q$ to $y$ in $P$ . If $z<y$ then $P^{\prime}$ results from attaching $M$ as the left subtree of the node with smallest key in $V$ , and if $z>y$ then $y$ then $P^{\prime}$ results from attaching $M$ as the right subtree of the node with largest key in $U$ . Thus, $k^{\prime}\leq k-j+2$ . In all cases, $\ell_{J^{+}}(z)-\ell_{K^{+}}(z)$ is at most $4(k-k^{\prime})$ . ∎

Theorem 7.5.

The crossing bound is approximately monotone.

Proof.

Let $T_{1},\ldots,T_{m}$ be the after-trees of Move-to-Root’s execution of $X=(x_{1},\dots,x_{m})$ starting from $T$ , and let $m\geq e_{1}>e_{2}>\cdots>e_{p}\geq 1$ be a sequence of request times. Set $X_{0}=X$ and for $1\leq i\leq p$ form $X_{i}$ by removing request $e_{i}$ from $X_{i-1}$ . We induct on $p$ to show that $\Lambda(X_{p},T)\leq\Lambda(X,T)+3\sum_{i=1}^{p}\ell_{T_{e_{i}-1}}(x_{e_{i}})$ where $T_{0}=T$ , which suffices to establish that the crossing bound has subsequence overhead at most four. The statement is trivial when $p=0$ . Now suppose $\Lambda(X_{p-1},T)\leq\Lambda(X,T)+3\sum_{i=1}^{p-1}\ell_{T_{e_{i}-1}}(x_{e_{i}})$ . Let $y=x_{e_{p}}$ and let $W$ and $Z$ respectively be the first $e_{p}-1$ and final $m-e_{p}-(p-1)$ requests in $X_{p-1}$ , so that $X_{p-1}=W\oplus(y)\oplus Z$ and $X_{p}=W\oplus Z$ . Let $S=T_{e_{p}-1}$ , so that $\Lambda(X_{p-1},T)=\Lambda(W,T)+\Lambda((y)\oplus Z,S)$ and $\Lambda(X_{p},T)=\Lambda(W,T)+\Lambda(Z,S)$ . Note that $\Lambda((y)\oplus Z,S)=\ell_{S}(y)+\Lambda(Z,\operatorname{move-to-root}(S,y))$ . Thus, $\Lambda(X_{p},T)-\Lambda(X_{p-1},T)=\Lambda(Z,S)-\Lambda(Z,\operatorname{move-to-root}(S,y))-\ell_{S}(y)$ , which by Theorem 7.4 is at most $3\ell_{S}(y)$ . Therefore, $\Lambda(X_{p},T)\leq\Lambda(X_{p-1},T)+3\ell_{T_{e_{p}-1}}(x_{e_{p}})$ , and the hypothesis holds for $p$ request removals. ∎

Our presentation of the crossing bound is based on Iacono’s work [37, 38]. Move-to-Root, introduced by Allen and Munro [3], is the earliest example of a self-adjusting binary search tree algorithm. The crossing bound is not strictly monotone. For example, $\Lambda(Y,T)>\Lambda(X,T)$ when $X=(4,5,3)$ , $Y=(5,3)$ and $\operatorname{postorder}(T)=(3,2,5,6,4,7,1)$ . We had not realized this when writing our SODA paper, whose treatment of the crossing bound contains several mistakes [46].

8 The Way Forward

Our numerical experiments indicate that Splay’s cost never exceeds four times the sum of an instance’s crossing bound and initial tree size, which would imply dynamic optimality. Moreover, the crossing costs of Splay and Move-to-Root are so tightly coupled that the difference between them may well be at most linear in initial tree size. Additionally, the keys in the crossing nodes of these algorithms’ executions are quite similar, albeit sometimes offset from each other in symmetric order by a small amount. We have tried to prove these statements, to no avail. We believe our failures are not incidental, and that there are structural obstacles in the way of establishing dynamic optimality in this manner. The difficulty arises from temporal spread. Typically, about half of the keys in Move-to-Root’s crossing nodes for a given request appear on the access path for the corresponding request in Splay’s execution. A smaller fraction of these keys appear on the splay path for the next request, and the remaining keys are scattered across subsequent splay paths. The precise extent of this spreading is varied and depends on the particular request sequence.

Lucas remarked that optimal cost does not seem amenable to inductive analysis [49]. The observed temporal mixing is a manifestation of this problem, since it means that showing Splay’s cost obeys the crossing bound requires accounting for many of its preceding transition trees at each request. Fortunately, dynamic optimality’s equivalence to approximate monotonicity provides a means of shattering this barrier. Our conviction is:

Conjecture 8.1.

Splay’s crossing cost is approximately monotone.

Our proof of the crossing bound’s monotonicity is a natural starting point for tackling Conjecture 8.1. However, it requires a crucial modification. Our analysis of Move-to-Root establishes a worst-case bound on how its crossing cost increases after removing a request. By contrast, the increase in Splay’s crossing cost is not bound by any fixed multiple of the removed key’s crossing depth. For example, if $T$ results from splaying the largest key in a right spine comprising the integers $\{1,\dots,2n\}$ , $Y=(2,4,\dots,2n-2)$ and $X=(1)\oplus Y$ , then the crossing costs of Splay’s executions of $X$ and $Y$ when starting from $T$ are respectively $3n-4$ and $5n-5$ . Consequently, we must examine how request removals affect Splay’s crossing cost in aggregate, which is equivalent to reversing the order in which the proof of Theorem 7.5 inducts on request removals. This style of induction entails comparing executions of a request sequence starting from progressively divergent trees. The increased complexity of the required new approach is another manifestation of the barriers to standard inductive analysis of optimal algorithms. The advantage of attacking this manifestation of the problem is that Theorem 7.5 assures its achievability.

Adapting such a proof from Move-to-Root to Splay will almost certainly require a potential function in order to smooth out the effects of occasional requests whose removal produces a high increase in Splay’s crossing cost. Potential functions are tools for analyzing algorithms that have individual operations with high cost, but for which the cost per operation, amortized over all operations in a sequence, is low [64]. Each possible configuration of the data structure (e.g. the tree) is assigned a numerical value, called its potential. The cost of an operation is redefined to depend on both the original cost (e.g. the length of the Splay path), and on how the potential changes due to the operation’s effect on the data structure. If carefully constructed, the sum of the redefined costs over a sequence of operations will be an upper bound on the sum of the actual costs, yet no individual operation’s redefined cost will ever be very large. We need a potential that captures how Splay’s executions diverge from Move-to-Root’s.

Move-to-Root is a both progenitor and a sub-step of splaying. Move-to-Root splits the access path into a pair of spines. One can view Splay as comprising two phases: the first executes move-to-root, the second performs extra rotations, corresponding to the zig-zigs. (See Figure 7.) The extra rotations ensure that a splay operation decreases the depth of every node by about half of the number of its ancestors that were on the access path for the requested key [62].

Theorem 8.2 ([14, Proposition 17]).

Executing $\operatorname{splay}(T,x)$ is equivalent to starting from the tree $\operatorname{move-to-root}(T,x)$ and rotating at every key held by a child $y$ on the access path for $x$ in $T$ whose parent in $T$ is on the same side of $x$ in symmetric order for which $d_{T}(x)-d_{T}(y)$ is odd.

Proof.

By induction on the number of splay steps involved. The theorem is trivially satisfied when splaying at the root. Now suppose the statement is true for splay operations comprising $k-1$ splay steps and let $x$ be a node in $T$ whose splaying involves $k$ steps. Let $Q$ be the subtree rooted at the ancestor of $x$ in $T$ whose depth is $d_{T}(x)-2(k-1)$ . Since the first $k-1$ splay steps each decrease the depth of $x$ by two, $\operatorname{splay}(T,x)=\operatorname{splay}(S,x)$ where $S=\operatorname{splay}(Q,x)$ . Denote by $\operatorname{splay}^{\prime}$ the procedure described in the theorem. Let $z=\operatorname{root}(S)$ , let $y$ be the left child of $z$ in $S$ and assume without loss of generality that $x<z$ . (The other case is symmetric.) If $x=y$ then $\operatorname{splay}^{\prime}(S,x)$ only enacts a single rotation at $x$ , which is equivalent to the zig enacted by $\operatorname{splay}$ . If $x>y$ then $y$ and $z$ are on opposite sides of $x$ in symmetric order and $\operatorname{splay}^{\prime}(S,x)$ rotates twice at $x$ , making it identical to the zig-zag step performed by $\operatorname{splay}$ . Finally, if $x<y$ then $\operatorname{splay}^{\prime}(S,x)$ rotates twice at $x$ and then once at $y$ , which is equivalent the rotation at $y$ followed by $x$ in the zig-zig performed by $\operatorname{splay}$ . In all three cases, $\operatorname{splay}^{\prime}(S,x)=\operatorname{splay}(S,x)$ . Form $S^{\prime}$ by replacing $Q$ with $\operatorname{splay}^{\prime}(Q,x)$ in $T$ . The edges rotated by $\operatorname{splay}^{\prime}$ subsequent to the $\operatorname{move-to-root}$ operation are disjoint, and $d_{S^{\prime}}(x)-d_{S^{\prime}}(w)$ and $d_{T}(x)-d_{T}(w)$ have the same parity for every ancestor $w$ of $x$ in $S^{\prime}$ . Therefore, $\operatorname{splay}^{\prime}(T,x)=\operatorname{splay}^{\prime}(S^{\prime},x)$ . By the inductive hypothesis $S^{\prime}=S$ , thus $\operatorname{splay}^{\prime}(T,x)=\operatorname{splay}(T,x)$ . ∎

Each zig-zig can create a violation of the max-heap ordering with respect to the standard priorities of an instance. As the executions of both Splay and Move-to-Root proceed, these zig-zigs will sometimes create further heap order violations. At other times, splay steps will remove some of the heap order violations. The correct potential for analyzing Splay’s crossing cost should in some way bound the rate at which splay operations generate heap order violations with respect to the standard priorities. We speculate on two possible forms. The first simply counts the number of edges in the tree being splayed that violate the heap-order condition with respect to most recent access time. This potential may be too “coarse,” in that it fails to capture heap order violations between nodes not immediately connected by an edge. If so, the likely way to address this shortcoming is weighting each node by some function of the difference between its crossing depth in the splayed tree and in the max-heap order maintained by Move-to-Root. While honing the details of the potential’s construction falls outside this work’s scope, we can infer something important up front.

A potential function’s design is closely tied to the extent to which its value can increase or decrease. By Theorem 4.3, if Splay is dynamically optimal then its startup overhead is no more than linear in initial tree size. Hence, any potential for proving Conjecture 8.1 should also have maximum value at most linear in the size of the starting tree. This considerably narrows the design space that we might otherwise need to explore.

To prove optimality we must also address Splay’s bookkeeping cost. Here again, we can glean insight from Splay’s progenitor. Move-to-Root is not dynamically optimal. For example, if $T$ is a left spine with keys $\{1,\dots,n\}$ , $X=(n,n-1,\dots,2,1,2,\dots,n-1,n)$ and $Y=(1,2,\dots,n)$ then the cost of executing Move-to-Root on the subsequence $Y$ is proportional to $n$ times the cost of its execution on the super-sequence $X$ . Because its crossing cost lower bounds a fixed multiple of optimal cost, Move-to-Root’s non-optimality arises from its bookkeeping cost. Splay tweaks Move-to-Root by breaking apart bookkeeping edges via zig-zig steps. Thus, Splay seems to be precisely the modification needed to make Move-to-Root optimal. We believe:

Conjecture 8.3.

Splay’s bookkeeping cost is at most a fixed multiple of the sum of its crossing cost and initial tree size.

Heuristically, splaying in a tree whose access paths comprise mostly bookkeeping nodes increases the average crossing depth of nodes in the tree, and the opposite phenomenon occurs in trees with many nodes of high crossing depth. Precisely tracking this exchange as Splay’s execution progresses quickly becomes unmanageable, indicating the need for an additional potential function that acts as a proxy for the number of bookkeeping nodes in the tree being splayed. It seems likely that a tree entirely comprising a spine should maximize this potential, and that a perfectly balanced binary search tree should minimize it. Conjectures 8.1 and 8.3 together imply dynamic optimality.

Appendix A Rotational Execution

Sleator and Tarjan [61], in their formulation of the dynamic optimality conjecture, use a rotation-based definition of a binary search tree execution. Given an initial tree and a request sequence, a rotational execution fulfills one request at a time, by performing a binary search for the requested key in the current tree, at a cost equal to the number of nodes on the access path. In addition, the execution can include any number of rotations before each request, at a cost of one per rotation. Formally, a rotational execution $R$ of $X=(x_{1},\ldots,x_{m})$ starting from $T$ comprises a sequence of trees $T_{0},T_{1},\dots,T_{r}$ and search times $0\leq\tau_{1}\leq\cdots\leq\tau_{m}=r$ where $T_{0}=T$ and $T_{t}$ results from rotating at some key in $T_{t-1}$ for $1\leq t\leq r$ . The cost to execute $R$ is $r+\sum_{i=1}^{m}d_{T_{\tau_{i}}}(x_{i})$ . We denote the cost of an optimal rotational execution for this instance by $\operatorname{OPT}_{\operatorname{rot}}(X,T)$ . We shall prove that any transition tree execution can be simulated by a rotational execution of at most twice the cost, and vice-versa. Thus the two cost models are the same to within a factor of two.

Theorem A.1.

$1/2\leq\operatorname{OPT}(X,T)/\operatorname{OPT}_{\operatorname{rot}}(X,T)\leq 2$ .

Proof.

First we observe that any transition tree execution can be simulated by a rotational execution at a cost of a factor of at most two. A $k$ -node binary search tree with $k$ keys can be transformed into any other binary search tree of the same set of keys by doing at most $2k-2$ rotations [21, Theorem 2.1]. Hence each successive after-tree in the transition tree model can be produced from the previous one by doing at most $2k-2$ rotations, where $k$ is the number of nodes in the rooted hull (and in the corresponding transition tree). Searching for the desired key after doing these rotations costs one. Hence if a transition tree execution fulfills a request with a transition tree of size $k$ , then a rotational executional execution can fulfill this request with cost at most $2k-1<2k$ .

Simulating a rotational execution by a transition tree execution is more complicated, because the former allows rotations to be done anywhere in the tree, not just in a rooted hull. The first step toward handling this is to view edges as retaining their identity throughout a rotational execution. Consider a rotation of a left child $x$ whose parent is $y$ ; let $u$ , $v$ , and $w$ be the left child of $x$ , the right child of $x$ , and the right child of $y$ , respectively. Let $e$ , $a$ and $b$ be the edges connecting $x$ , $v$ and $y$ with their parents, respectively. (If $v=\mathtt{null}$ then $a=\mathtt{null}$ and if $y$ is the root then $b=\mathtt{null}$ .) Rotation at $e$ swaps the ends of $e$ and converts it from a left edge to a right edge, converts $a$ from a right edge to a left edge and changes its top end from $x$ to $y$ , and changes the bottom end of $b$ to $x$ . (Rotation at $e$ does not change $\mathtt{null}$ edges.) The rotation affects no other edges, and it preserves the set of keys in the subtree rooted at any node other than $x$ and $y$ , and in particular those rooted at $u$ , $v$ , and $w$ . Right rotations behave symmetrically.

The second step is to modify the rotational execution so that whenever a key is searched for it is at the root of the tree. Before a search occurs, we first rotate on each edge of the access path, bottom-up, which moves the key to be searched for to the root; then we perform the search; then we do the inverse rotations in the opposite order, restoring the original access path. Fulfilling the request in this way costs $2k-1$ if the original access path has $k$ nodes. Thus we increase the overall cost by at most a factor of two.

Finally, assume that a rotational execution moves each requested key to the root before searching for it. We simulate this rotational execution with a transition tree execution while at the same time postponing some rotations. Proceeding in the same order as keys are requested, we modify the subsequence of rotations before the first search, and each subsequence of rotations between successive searches, as follows. Let $S$ be such a subsequence, let $T$ be the tree in which these rotations begin, and let $U$ be the subgraph of $T$ comprising the edges in $S$ . We partition $S$ into pair of subsequences $A$ and $B$ . The subsequence $A$ comprises rotations in $S$ at edges in the same connected component of $U$ as the root of $T$ . (If no such rotations are present then $A$ is empty.) The subsequence $B$ is the complementary subsequence to $A$ in $S$ . We replace $S$ with $A\oplus B$ unless $S$ is the subsequence of rotations for the final request, in which case we replace $S$ with $A$ and drop the remaining rotations. Then, we move the search time for the request to occur immediately after the final rotation in $A$ .

If $A$ is nonempty then its edges comprise a rooted hull in $T$ , and the rotations in $A$ transform this rooted hull into a tree on the same set of keys whose root contains the requested key. The transformed tree is the transition tree corresponding to the request in the transition tree execution. (If $A$ is empty then the transition tree comprises solely the root of $T$ .) If the rooted hull (and the transition tree) contain $k$ nodes, the number of rotations is at least $k-1$ , making the cost of these rotations plus the cost of the search at least $k$ in the rotational execution. The size of the corresponding transition tree is $k$ . We conclude that it is possible to simulate a rotational execution whose searches occur at the root with a transition tree execution of the same cost, and at most twice the cost for a general rotational execution. Creating simulations for optimal executions of each type establishes the result. ∎

Wilber was the first to restrict rotational executions to search only at the root [67]. The procedure for partitioning rotations is implicit in Lucas’ work [49]. Our description is based on Koumoutsos’ remarks [41]. Harmon was the first to describe binary search tree executions using transition trees [34].

Appendix B Wilber’s Lower Bound

We show that the crossing bound is at most a fixed multiple of optimum transition tree execution cost. Our proof proceeds in two main steps. First, we express a scoring procedure defined by Wilber in terms of the crossing bound. Then we use Wilber’s proof that this procedure lower bounds optimum rotational cost as a black box in our analysis to obtain the desired result. (Wilber’s proof is quite intricate, and we do not attempt to summarize it.) Unlike the crossing bound, Wilber’s scoring procedure depends only on the request sequence. Accounting for initial trees requires care.

Formally, Wilber’s bound for request sequence $X=(x_{1},\dots,x_{m})$ , denoted $\Lambda_{2}(X)$ , is $m+\sum_{i=1}^{m}\kappa(X,i)$ , where the score $\kappa(X,i)$ for each request $1\leq i\leq m$ is as follows. If $i=1$ then the score is zero. Otherwise, let $c_{1}=i-1$ and let $w_{1}=x_{i-1}$ . If $w_{1}<x_{i}$ set $v_{0}=\infty$ , otherwise set $v_{0}=-\infty$ . Initialize $l=1$ and repeat the following process for as long as $w_{l}\neq x_{i}$ and there are keys requested prior to time $c_{l}$ lying between $x_{i}$ (inclusive) and $v_{l-1}$ (exclusive) in symmetric order. Set $c_{l+1}$ to the latest request time preceding $c_{l}$ for a key lying between $x_{i}$ (inclusive) and $v_{l-1}$ (exclusive) in symmetric order. Set $w_{l+1}$ to the key requested at $c_{l+1}$ . Set $v_{l}$ to the key closest in symmetric order to $x_{i}$ (exclusive) on the same side of $x_{i}$ in symmetric order as $w_{l}$ that is requested after $c_{l+1}$ and no later than $c_{l}$ . Finally, increment $l$ by one. The score is one less than the terminal value of $l$ . We respectively refer to $w_{1},\ldots,w_{l}$ and $v_{0},\ldots,v_{l-1}$ as the crossing keys and inside keys for the request. Wilber’s bound is nearly the same as the crossing bound for $X$ starting from the default tree $\operatorname{BST}(X)$ comprising the keys in $X$ min-heap ordered by their first request times.

Theorem B.1.

$\Lambda_{2}(X)=\Lambda(X,\operatorname{BST}(X))-|\operatorname{BST}(X)|+1$ * whenever $X\neq\varnothing$ .*

Proof.

By induction on the number and crossing depths of requests. Since the first request’s score is zero, Wilber’s bound is one for the singleton request sequence. Meanwhile, the first requested key lies at the root of the default tree for the request sequence and the root has crossing depth one. Thus, the formula holds for sequences containing a single request. Now suppose the theorem is true for all request sequences of length up to $m-1$ , let $Y$ be a nonempty sequence of $m-1$ requests, let $X=Y\oplus(x)$ , let $T$ be the final after-tree in Move-to-Root’s execution of $Y$ starting from $\operatorname{BST}(X)$ , and set $\delta$ to be one if $x\notin Y$ and zero otherwise. We show that the first $\ell_{T}(x)-\delta$ crossing nodes for $x$ in $T$ , ordered increasing by depth, contain the crossing keys for request $m$ , and that the respective parents of these nodes contain the inside keys for the request. (If the zeroth inside key is $\infty$ we treat $T$ as the left subtree of this key, and otherwise as the right subtree of $-\infty$ .)

The last key requested in $Y$ is the first crossing key for request $m$ in $X$ . Meanwhile, by Theorem 7.2, the keys in $Y$ comprise a rooted hull in $T$ max-heap ordered by their last request times in $Y$ . In particular, the root of $T$ , which is the first crossing node for $x$ in $T$ , contains the first crossing key. Now suppose that the first $i$ crossing nodes for $x$ in $T$ contain the first $i$ crossing keys for request $m$ in $X$ for some $1\leq i<\ell_{T}(x)$ , and that the parents of these nodes contain the first $i$ inside keys. Let $w$ and $w^{\prime}$ respectively be the deepest among the first $i$ and $i+1$ crossing nodes for $x$ in $T$ , let $v$ and $v^{\prime}$ be the respective parents of these nodes, and assume without loss of generality that $x<w$ . (The other case is symmetric.)

First consider when $w^{\prime}\in Y$ . Since $w^{\prime}$ is a descendant of $w$ in $T$ , the former’s final request time in $Y$ precedes the latter’s. Because $w^{\prime}$ is in the right subtree of $v$ and either $w^{\prime}=x$ or $w^{\prime}$ contains $x$ in its right subtree, $w^{\prime}$ is greater than $v$ and at most $x$ . Every key in this interval is a descendant of $w^{\prime}$ , making $w^{\prime}$ the last among them requested in $Y$ . Applying the inductive hypothesis that $w$ and $v$ respectively contain crossing key $i$ and inside key $i-1$ for request $m$ in $X$ establishes that $w^{\prime}$ contains crossing key $i+1$ . Since $v^{\prime}$ is both the parent of $w^{\prime}$ and the deepest node on the left spine of the subtree of $T$ rooted at $w$ which contains $x$ in its left subtree, it has the smallest key greater than $x$ whose final request comes after the last request for $w^{\prime}$ and no later than the last request for $w$ in $Y$ . Thus, $v^{\prime}$ is inside key $i$ for request $m$ in $X$ . Furthermore, if $i+1=\ell_{T}(x)$ then $w^{\prime}=x$ and there are no further crossing keys for request $m$ in $X$ .

Otherwise, if $w^{\prime}\notin Y$ then $w^{\prime}=x$ , $i=\ell_{T}(x)-1$ , and the subtree rooted at $x$ in $T$ contains every key that is greater than $v$ and at most $x$ . Since Move-to-Root is a natural algorithm and $x$ has no children in $\operatorname{BST}(X)$ , the absense of $x$ in $Y$ ensures that $x$ has no children in $T$ , making $x$ the only key in this interval. Thus, there are only $\ell_{T}(x)-1$ crossing keys for request $m$ in $X$ when $\delta=1$ .

By the inductive hypothesis on request sequences of length $m-1$ , $\Lambda_{2}(Y)=\Lambda(Y,\operatorname{BST}(Y))-|\operatorname{BST}(Y)|+1$ , and by the above arguments $\kappa(X,m)=\ell_{T}(x)-1-\delta$ . Since $\Lambda_{2}(X)=\Lambda_{2}(Y)+\kappa(X,m)+1$ and $\Lambda(X,\operatorname{BST}(X))-|\operatorname{BST}(X)|=\Lambda(Y,\operatorname{BST}(Y))-|\operatorname{BST}(Y)|+\ell_{T}(x)-\delta$ , the formula holds for request sequences of length $m$ . ∎

Theorem B.2.

$\Lambda(X,T)\leq 44\operatorname{OPT}(X,T)$ .

Proof.

Let $P=\operatorname{postorder}(T)$ and $T^{\prime}=\operatorname{BST}(P\oplus X)$ and note that $T^{\prime}=\operatorname{BST}(P)$ since $P$ contains every key in $T$ . Let $A$ and $B$ respectively be optimal executions for $P$ and $P\oplus X$ starting from $T$ , let $U$ be the sequence comprising the first $|T|-1$ transition trees of $A$ , let $V$ be the sequence comprising the final $|X|$ transition trees in $B$ , and let $T^{\prime\prime}$ be the after-tree for request $|T|$ in $B$ . The transition tree sequence $U\oplus(T^{\prime\prime})\oplus V$ is an execution for $P\oplus X$ starting from $T$ with cost at most $\operatorname{OPT}(P,T)+|T|+\operatorname{OPT}(X,T^{\prime\prime})$ . By [47, Theorem 4], $\operatorname{cost}(P,T^{\prime})\leq 7|T|$ . Splay’s cost and initial tree size respectively upper bound and lower bound optimum cost, meaning $\operatorname{OPT}(P,T)\leq\operatorname{OPT}(P,T^{\prime})+|T|\leq 8|T|$ and $\operatorname{OPT}(X,T^{\prime\prime})\leq\operatorname{OPT}(X,T)+|T|$ . Combining these inequalities establishes $\operatorname{OPT}(P\oplus X,T)\leq\operatorname{OPT}(X,T)+10|T|\leq 11\operatorname{OPT}(X,T)$ . By Theorem 7.2, $T$ is the final tree in Move-to-Root’s execution of $P$ starting from $T^{\prime}$ . Hence, $\Lambda(P\oplus X,T^{\prime})=\Lambda(P,T^{\prime})+\Lambda(X,T)$ . By Theorem B.1, $\Lambda_{2}(P\oplus X)=\Lambda(P\oplus X,T^{\prime})-|T^{\prime}|+1$ and $\Lambda_{2}(P)=\Lambda(P,T^{\prime})-|T^{\prime}|+1$ . Therefore, $\Lambda(X,T)=\Lambda(P\oplus X,T^{\prime})-\Lambda(P,T^{\prime})=\Lambda_{2}(P\oplus X)-\Lambda_{2}(P)\leq\Lambda_{2}(P\oplus X)$ . Finally, by [67, Theorem 7] and Theorem A.1, $\Lambda_{2}(P\oplus X)\leq 2\operatorname{OPT}_{\operatorname{rot}}(P\oplus X,T)\leq 4\operatorname{OPT}(P\oplus X,T)\leq 44\operatorname{OPT}(X,T)$ . ∎

Acknowledgments

We thank Luís Russo for suggesting improvements to Figure 3, Kurt Mehlhorn for simplifying our proof of Theorem 4.3, Amit Halevi for comments that clarified the presentation of our execution model, and Siddhartha Sen and Bernard Chazelle for editorial feedback. The high-level presentation of Sections 3 and 4 benefited from informal discussions with Daniel Cooney. We are indebted to John Iacono for his guidance in understanding the equivalence between Wilber’s bound and Move-to-Root’s crossing nodes, along with corroborating our empirical comparisons between the behaviors of Splay and Wilber’s bound. Finally, we found David Galles’ “Data Structure Visualizations” website instrumental for prototyping our proofs [27]. Research at Princeton University partially supported by an innovation research grant from Princeton and a gift from Microsoft.

Bibliography67

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Georgy Adel’son-Vel’skii and Evgenii Landis “An algorithm for the organization of information” In Soviet Mathematics Doklady 3 , 1962, pp. 1259–1263
2[2] Yehuda Afek et al. “The CB tree: a practical concurrent self-adjusting search tree” In Distributed Computing 27.6 , 2014, pp. 393–417 DOI: 10.1007/s 00446-014-0229-0 · doi ↗
3[3] Brian Allen and Ian Munro “Self-organizing binary search trees” In Journal of the ACM 25.4 , 1978, pp. 526–535 DOI: 10.1145/322092.322094 · doi ↗
4[4] Arne Andersson “General balanced trees” In Journal of Algorithms 30.1 , 1999, pp. 1–18 DOI: 10.1006/jagm.1998.0967 · doi ↗
5[5] Rudolf Bayer and Edward Mc Creight “Organization and maintenance of large ordered indexes” In Acta Informatica 1.3 , 1972, pp. 173–189 DOI: 10.1007/bf 00288683 · doi ↗
6[6] Michael Bender, Martín Farach-Colton and William Kuszmaul “What does dynamic optimality mean in external memory?” In Innovations in Theoretical Computer Science Dagstuhl, Germany: Schloss Dagstuhl, 2022, pp. 1–23 DOI: 10.4230/LIPICS.ITCS.2022.18 · doi ↗
7[7] Benjamin Berendsohn and László Kozma “Splay trees on trees” In Symposium on Discrete Algorithms Alexandria, Virginia, USA: Society for Industrial Applied Mathematics, 2022, pp. 1875–1900 DOI: 10.1137/1.9781611977073.75 · doi ↗
8[8] Prosenjit Bose et al. “Competitive online search trees on trees” In Symposium on Discrete Algorithms Salt Lake City, Utah, USA: Society for Industrial Applied Mathematics, 2020, pp. 1878–1891 DOI: 10.1137/1.9781611975994.115 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A Foundation for Proving Splay is Dynamically Optimal111This paper is an adaptation of the first author’s Ph.D. thesis [45]. We presented an earlier version at SODA [46].

Abstract

1 Context

2 Preliminaries

3 Approximate Monotonicity

Theorem 3.1**.**

Proof.

Theorem 3.2**.**

Proof.

Theorem 3.3**.**

Proof.

Theorem 3.4**.**

Proof.

Theorem 3.5**.**

Proof.

4 Startup Overhead

Theorem 4.1**.**

Proof.

Theorem 4.2**.**

Proof.

Theorem 4.3**.**

Proof.

5 Mutation

Theorem 5.1**.**

Proof.

Theorem 5.2**.**

Proof.

Theorem 5.3**.**

Proof.

6 Natural Algorithms

Theorem 6.1**.**

Proof.

Theorem 6.2**.**

Proof.

Theorem 6.3**.**

Proof.

7 Crossing Cost

Theorem 7.1**.**

Proof.

Theorem 7.2**.**

Proof.

Theorem 7.3**.**

Proof.

Theorem 7.4**.**

Proof.

Theorem 7.5**.**

Proof.

8 The Way Forward

Conjecture 8.1**.**

Theorem 8.2** ([14, Proposition 17]).**

Proof.

Conjecture 8.3**.**

Appendix A Rotational Execution

Theorem A.1**.**

Proof.

Appendix B Wilber’s Lower Bound

Theorem B.1**.**

Proof.

Theorem B.2**.**

Proof.

Acknowledgments

Theorem 3.1.

Theorem 3.2.

Theorem 3.3.

Theorem 3.4.

Theorem 3.5.

Theorem 4.1.

Theorem 4.2.

Theorem 4.3.

Theorem 5.1.

Theorem 5.2.

Theorem 5.3.

Theorem 6.1.

Theorem 6.2.

Theorem 6.3.

Theorem 7.1.

Theorem 7.2.

Theorem 7.3.

Theorem 7.4.

Theorem 7.5.

Conjecture 8.1.

Theorem 8.2 ([14, Proposition 17]).

Conjecture 8.3.

Theorem A.1.

Theorem B.1.

Theorem B.2.