A Foundation for Proving Splay is Dynamically Optimal
Caleb C. Levy, Robert E. Tarjan

TL;DR
This paper develops a theoretical framework to prove the long-standing conjecture that the splay tree algorithm is dynamically optimal for binary search tree operations, by linking it to approximate monotonicity.
Contribution
It introduces the concept of approximate monotonicity and establishes that Splay is dynamically optimal if and only if it is approximately monotone, laying the groundwork for proving the conjecture.
Findings
Lower bounds on optimal cost are approximately monotone.
Splay is dynamically optimal if and only if it is approximately monotone.
Framework extends to insertion, deletion, and related algorithms.
Abstract
Consider the task of performing a sequence of searches in a binary search tree. After each search, we allow an algorithm to arbitrarily restructure the tree. The cost of executing the task is the sum of the time spent searching and the time spent optimizing the searches with restructuring operations. Sleator and Tarjan introduced this notion in 1985, along with an algorithm and a conjecture. The algorithm, Splay, is an elegant procedure for performing adjustments that move searched items to the top of the tree. The conjecture, called dynamic optimality, is that the cost of splaying is always within a constant factor of the optimal algorithm for performing searches. We lay a foundation for proving the dynamic optimality conjecture. Central to our method is approximate monotonicity. Approximately monotone algorithms are those whose cost does not increase by more than a fixed multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Algorithms and Data Compression · Machine Learning and Algorithms
A Foundation for Proving Splay is Dynamically Optimal111This paper is an adaptation of the first author’s Ph.D. thesis [45]. We presented an earlier version at SODA [46].
Caleb C. Levy222Sunshine; [email protected].
Robert E. Tarjan333Department of Computer Science, Princeton University; Intertrust Technologies; [email protected].
Abstract
Consider the task of performing a sequence of searches in a binary search tree. After each search, we allow an algorithm to arbitrarily restructure the tree. The cost of executing the task is the sum of the time spent searching and the time spent optimizing the searches with restructuring operations. Sleator and Tarjan introduced this notion in 1985, along with an algorithm and a conjecture. The algorithm, Splay, is an elegant procedure for performing adjustments that move searched items to the top of the tree. The conjecture, called dynamic optimality, is that the cost of splaying is always within a constant factor of the optimal algorithm for performing searches. We lay a foundation for proving the dynamic optimality conjecture. Central to our method is approximate monotonicity. Approximately monotone algorithms are those whose cost does not increase by more than a fixed multiple after removing searches from the sequence. As we shall see, Splay is dynamically optimal if and only if it is approximately monotone. This result extends to a weaker form of approximate monotonicity as well as insertion, deletion, and related algorithms. We prove that a lower bound on optimal execution cost is approximately monotone and outline how to adapt this proof from the lower bound to Splay, and how to overcome the remaining barriers to establishing dynamic optimality.
1 Context
The binary search tree is the canonical pointer-based data structure for maintaining a sorted collection in fast memory. Its most attractive feature is that the number of comparisons required to verify the presence of an item is logarithmic in the size of the tree, provided that the tree is properly arranged. Without exercising care when adding elements, however, a binary search tree can easily become unbalanced, making search cost proportional to the size of the tree in the worst case. Thus binary search trees require some form of maintenance and restructuring for good performance.
Adel’son-Vel’skii and Landis gave the first method that guarantees efficient searches in the presence of updates [1]. They supplement nodes with bits that provide rough information about how balanced each node’s subtrees are. After an insertion or deletion, a restructuring procedure restores invariants on the balance bits. These invariants ensure that all paths in the tree have length at most logarithmic in the tree’s size. There are many variations of this idea. Perhaps most famous is the red-black tree due to its requiring fewer restructuring operations and a catchy name [32]. Restructuring schemes based on balance bits remain an active topic of research [33]. Many more schemes now exist. Randomized search trees, such as treaps [60], zip trees [66] and others [52] trade worst-case performance guarantees for good expected behavior in order to gain simpler rebalancing procedures and fewer pointer changes. Scapegoat trees [4, 28] defer restructuring operations until they can be executed in bulk. B-trees [5] and their derivatives [20], close relatives of binary search trees that use system memory characteristics to determine node arity, are ubiquitous in database applications. There are numerous related data structures. For the most part, they are well-understood. However, there is a class of binary search tree algorithm whose behavior remains one of the great open questions in theoretical computer science.
While the above-mentioned data structures guarantee logarithmic search time, they usually cannot perform much better than this. Real-world access patterns often have some latent structure. For example records may be arranged in partially sorted sub-blocks, and databases often receive frequent requests for a small number of high-traffic elements. In such situations it can be possible to do better than logarithmic time per access by adjusting the tree after searches, instead of solely after adding or removing elements. This leads to colloquially named “self-adjusting” binary search tree algorithms. Allen and Munro were the first to examine such algorithms in depth [3]. They developed a simple procedure with good expected behavior in many cases. By far the most famous self-adjusting binary search tree algorithm is Sleator and Tarjan’s improvement to this procedure, called Splay [61], which has many compelling properties and applications. For our purposes, Sleator and Tarjan’s most important contribution to this topic is not what they proved, but instead what they left unresolved. The dynamic optimality conjecture asserts that Splay is essentially the ideal algorithm for every possible access pattern. This problem’s intrigue arises from several sources.
Dynamic optimality would imply we can use Splay as a stand-in for many more-specialized data structures. Splay simultaneously acts as a balanced search tree and as a spatial and a temporal cache. It also shares properties with entropy-minimizing static trees [61] and data structures for disjoint set union [50]. Dynamically optimal algorithms can emulate multi-finger binary search trees [12] and doubly-ended queues [65].
Advances in our understanding of binary search trees percolate into other areas. Splay and related self-adjusting data structures inspired the creation of pairing heaps [26], smooth heaps [43] and slim heaps [35]. Splay’s properties find utility in encoding schemes [40], routing problems [59] and optimizing for concurrent non-uniform access [2], and the conjecture has analogues for B-trees [9, 24], search-tree-on-tree data structures [7, 8] and external memory settings [6].
Furthermore, the conjecture has become a nexus for the development of new strategies for analyzing data structures. Splay was intimately involved in the adaptation of potential functions from physics to computer science [64]. Concepts common in the analysis of forbidden substructures are frequently applied to Splay and related self-adjusting algorithms, with examples including Davenport-Schinzel sequences [55], forbidden submatrices [54] and pattern-avoiding permutations [13, 30]. Techniques from computational geometry are now common in this area of research [22, 39].
Finally, the conjecture has a distinct intellectual allure. At first blush, its claim seems too good to be true, which makes the idea of proving it all the more attractive. Its statement is elegant and deceptively simple, yet anyone who has attempted to tackle the problem can attest to its subtlety and utter defiance of standard mathematical approaches. Solutions frequently seem tantalizingly close while remaining just out of reach.
This investigation adopts a somewhat different tone from its companions. It can often be easier to induct on stronger hypotheses because they provide more exploitable structure. Accordingly, we have no qualms about presuming that Splay is dynamically optimal and allowing this to guide our intuition. Our objective is to determine how we can prove the conjecture, not if. Section 2 defines our execution model and summarizes related work. Section 3 shows that Splay is dynamically optimal if and only if it is approximately monotone. Section 4 formalizes optimality with additive overhead and demonstrates that if Splay is optimal then it has no such overhead. Section 5 extends both Splay and our execution model to incorporate mutation operations and establishes that if Splay is optimal without these operations then it is optimal when they are permitted. Section 6 generalizes our results to similar algorithms. Section 7 establishes that a non-trivial lower bound on optimal execution cost is approximately monotone. Section 8 outlines a speculative proof that Splay is approximately monotone. The appendices formalize relevant folklore.
2 Preliminaries
A binary tree comprises a finite set of nodes, with one node designated to be the root. All nodes have a left and a right child pointer, each leading to a different node. Either or both children may be missing; a missing child is denoted by . Every node in , save for the root, has a single parent node of which it is a child. (The root has no parent.) Each pairing of a node with its parent is an edge in . The size of is the number of nodes it contains, and is denoted . There is a unique path from to every other node in , called the access path for in . If is on the access path for then is an ancestor of , and is a descendant of . If these two nodes are distinct then is a strict ancestor of and is a strict descendant of . (Every node is an ancestor and a descendant of itself.) The subgraph comprising all descendants of is called the subtree rooted at . Nodes thus have left and right subtrees rooted respectively at their left and right children. (Subtrees are empty for children.) The depth of , denoted , is the length, in nodes, of its access path. A rooted hull in is a connected subgraph of that includes the root. A rooted hull is itself a binary tree.
In a binary search tree, every node has a unique key, and the tree satisfies the symmetric order condition: every node’s key is greater than those in its left subtree and smaller than those in its right subtree. The binary search tree derives its name from how its structure enables finding keys. To find a requested key, initialize the current node to be the root. While the current node is not and does not contain the requested key, replace the current node by its left or right child depending on whether requested key is smaller or larger than the key in the current node, respectively. The search returns the last current node, which contains the requested key if said key is in the tree and otherwise . The left spine of is the access path to the smallest key in , and the right spine of is the access path to the largest key in . (The spines of the empty tree are empty.) The left and right spines consist entirely of left and right pointers, respectively. A tree is flat if every node is on the left or right spine. To keep our presentation simple, we assume that a key and the node containing it can be used interchangeably in binary comparisons.
We denote by the length of a finite sequence . The symbol “” denotes sequence concatenation. (We sometimes write as .) The postorder of the empty tree is the empty sequence, and the postorder of binary search tree whose root has left and right subtrees and is . The index of a given key in is the number of keys in that are less than or equal to the given key. The function mapping each key in to its index is the index map for , and the inverse of this function is the reverse index. The index map of a finite totally ordered set is defined analogously. Two binary search trees are isomorphic if relabelling the keys in each tree to their respective indices produces trees with the same postorder.
To perform a transformation on tree , first select an arbitrary rooted hull in . Then reshape into any other binary search tree containing the same set of keys. We refer to as a transition tree. To complete the operation, form the after-tree by substituting for in , re-attaching the subtrees of to in the manner uniquely prescribed by the symmetric order.
An instance of a binary search tree optimization problem comprises a sequence of requested keys and an initial tree containing these keys. An execution for this instance comprises a sequence of rooted hulls , a sequence of transition trees , and a sequence of after-trees . For , is a rooted hull in , is a binary search tree with the same keys as such that , and results from substituting for in , where . (We refer to as the execution’s final tree.) The cost of is . At least one execution for starting from has minimum, or optimum cost, and we denote this cost by . Figure 1 shows an instance and a corresponding execution.
An execution’s rooted hull and after-tree for a given request are uniquely determined by the previous after-tree and the request’s transition tree. Thus we shall occasionally denote an execution by its sequence of transition trees. Defining the cost of an execution as the sum of the transition tree sizes captures the notion of paying for restructuring: fewer operations are required to substitute a smaller tree. Each rooted hull contains the access path, which accounts for the cost of searching. We describe instances that include insertions and deletions in Section 5.
Unless otherwise implied, we may assume without loss of generality that every node in the initial tree has a descendant in whose key is requested in [15, Theorem 43], in which case every key in appears in at least one transition tree and . Similarly, since every execution produces at least one transition tree per request. Furthermore, since an optimal algorithm can reshape the entire initial tree on the first request, for any pair of valid binary search trees and for request sequence with the same keys. Therefore, it makes little difference to optimal executions whether the initial tree is left specified or unspecified, and many authors do not distinguish between and . However, the initial tree can, potentially, have a significant impact on algorithmic behavior. Thus, we require instances to specify an initial tree. We discuss this further in Section 4.
A binary search tree algorithm maps each instance to an execution of the instance. We denote the cost of this execution by . We say is dynamically optimal if there is some constant so that for all request sequences and all corresponding initial trees . (Other terms include “constant-competitive” and “instance optimal.”)
A rotation at left child with parent in replaces the subtree rooted at with the tree whose root has right child such that the right subtree of before the rotation becomes the left subtree of afterward and the left subtree of and right subtree of are unchanged. Figure 2 depicts this process. We can also identify this rotation with the edge connecting to in . Rotation at a right child is symmetric, and rotation at the root is undefined. Rotation preserves symmetric order while changing up to three child pointers in the tree. Sleator and Tarjan originally measured execution cost by counting rotations. (See Appendix A.)
A splay operation begins with a binary search for a key in the tree. Let be the node returned by this search. If is not then the algorithm repeatedly applies a splay step until becomes the root. A splay step has one of three forms. If the parent of is the root then rotate at . (This case is always terminal.) Otherwise, if is a left child and its parent is a right child, or vice-versa, rotate at twice. Otherwise, rotate at the parent of , and then rotate at . Sleator and Tarjan assigned the respective names zig, zig-zag and zig-zig to these three cases [61]. The series of splay steps that bring to the root are collectively called splaying at , or simply splaying . If is a sequence of requested keys in then the cost of splaying starting from is , where and is the result of splaying in for . We will primarily be dealing with the Splay algorithm, so , without subscript, will always refer to the cost of splaying the keys of starting from .
While an individual splay path can involve every node in the tree, the mean cost of a splay operation, averaged over sufficiently many requests, is logarithmic in the tree’s size [61, Theorem 1]. This performance is similar to that of balanced binary search trees. What makes Splay remarkable is that it also takes advantage of latent structure in the request sequence. The amortized cost per splay operation is logarithmic in the number of unique keys requested since the previous request for the splayed key [61, Theorem 4]. Thus, Splay exploits temporal locality in the access pattern. Splay simultaneously exploits spatial locality. The amortized cost of a splay operation is logarithmic in the difference between successively requested keys’ indices in the starting tree [18, 19]. The dynamic optimality conjecture states that Splay is dynamically optimal.
Splay has many generalizations. Subramanian defined a class of algorithms that reshape a tree in small steps. A set of rules, called a “template,” determines which step to take based on the arrangement of nodes in the immediate vicinity of the currently selected node. Different templates give rise to different algorithms, and a number of these algorithms have many of the same properties as Splay [62]. Georgakopoulos and McClurkin [29] and later Chalermsook et al. [14] proved further results about related algorithms. Section 6 examines a generalization of template algorithms.
Besides Splay, the main candidate algorithm for optimality is colloquially known as Greedy. Lucas and Munro independently conjectured a version of the algorithm that arranges keys on the access path according to their soonest future access times is dynamically optimal [49, 53]. Demaine et al. subsequently developed a representation of binary search tree executions as cartesian coordinate point-sets [22]. They showed that in this geometric representation, Greedy executes a new request by uniting its execution of the previous requests with the minimum set of points needed to make the new execution satisfy some required properties. Subsequently, many of the interesting behaviors that first drew attention to Splay have been proved for Greedy, including exploitation of temporal locality [25, 31] and spatial locality [39], as well as some additional properties [13, 30].
Two other algorithms serve primarily demonstrative purposes. “Tango” trees require cost proportional to at most times the optimum cost in order to execute an instance [23], and Iacono describes a multiplicative weights update method that is optimal so long as a certain class of binary search tree algorithm contains an optimal member [37]. Both are difficult to implement.
Wilber derived two lower bounds on the cost of executions for a given instance [67]. One of his lower bounds counts the number of occurrences of certain structural patterns in the request sequence with respect to a given reference tree. We examine Wilber’s other lower bound, which we call the crossing bound, in Section 7. A third lower bound, called the “independent rectangle bound,” is defined geometrically [22]. Research into the relationships among these bounds is ongoing [10, 44].
Currently, there is no sub-exponential time algorithm that is known to compute the cost of an optimum binary search tree execution for an instance to within a constant factor. Circumstantial evidence indicates that exact computation of optimal execution cost may be intractable, since a slight generalization of the problem, in which instances comprise requests for batches of keys, is NP-Complete [22]. The theoretical and practical difficulties we encountered when trying to reason about optimal binary search tree executions ultimately led us to the present approach, which consciously avoids directly comparing algorithms with optimal behavior.
3 Approximate Monotonicity
How can we prove that Splay is dynamically optimal without knowing what optimum executions “look like?” We approach this question by combining two concepts. The first starts with a simple observation: in many situations, one intuitively expects that removing requests from an instance should decrease the cost for the algorithm to execute it. This may not always be the case, but it is a reasonable idea to explore. The second idea is to force an algorithm to simulate executions by feeding it appropriately constructed instances. An algorithm is approximately monotone if there is some constant so that for every request sequence , subsequence , and initial tree . A simulation embedding for is a map from executions to request sequences for which there exists such that is at most times the cost of and is a subsequence of for all instances and corresponding executions . If such a map exists then is coercible.
We add a few clarifying comments on terminology. A subsequence need not be contiguous. For example, is a subsequence of . Also, every sequence is a subsequence of itself. A real-valued set function is monotone if for all . Approximate monotonicity relaxes this requirement. (The functions we deal with are sequence-valued, but the concept is identical.) Our SODA paper referred to approximate monotonicity as the “subsequence property” [46]. In this work, we only build simulation embeddings for binary search tree algorithms. However, the concept itself seems more general and likely has other applications.
Theorem 3.1**.**
Optimal algorithms are monotone.
Proof.
Let be a sequence of requests with starting tree , let be an optimal execution of this instance with after-trees , and form subsequence from by choosing a subset of and keeping the requests in at times in . The requests at times can be partitioned into contiguous blocks of integers. (For example, if and then the removed time blocks are , , and .) Let be the index map for . Define the transition tree sequence as follows. For , if is one greater than the maximal element in a removed time block then set to be the rooted hull in comprising the union keys in with the keys in the transition trees of for the requests in said block, and otherwise set . The transition tree sequence is a valid execution for starting from , and . Since is an optimal execution for starting from , we conclude . ∎
Theorem 3.2**.**
A coercible algorithm is dynamically optimal if and only if it is approximately monotone.
Proof.
A simulation embedding can be used to simulate an optimal execution of a given instance just as well as any other execution. The cost for the algorithm to execute the simulation is no more than a fixed multiple of the optimal cost for that instance. The simulation of this optimal execution contains the original request sequence as a subsequence. If the algorithm is also approximately monotone, then the cost of executing the original instance will not exceed a fixed multiple of the simulation’s cost and hence of the optimal cost. In the other direction, if is dynamically optimal then there exists some constant for which for all instances and subsequences of , where the last inequality follows from Theorem 3.1. ∎
Approximate monotonicity is useful even if an algorithm is not dynamically optimal. For , define the subsequence overhead and optimal overhead of to be the respective suprema of and taken over all instances and all subsequences of for which .
Theorem 3.3**.**
For all , a coercible algorithm’s subsequence overhead and optimal overhead are within a constant factor independent of .
Proof.
By Theorem 3.1, for every instance and subsequence of , meaning . If is coercible then there exists a simulation embedding and constant such that for every optimal execution of request sequence with starting tree . Therefore . Since these inequalities hold for all instances with an initial tree of size , they hold true for the supremum. Thus for all . ∎
To build simulation embeddings, we employ an algorithm for transforming a binary search tree into another binary search tree with the same keys through the application of at most restricted rotations, which must occur at children or grandchildren of the root. Begin by repeatedly rotating at the root’s left child until all nodes in the left subtree of the root are on the left spine. Then, repeat the following until the root’s left and right subtrees are respectively left and right spines: repeatedly rotate at the left child of the root’s right child so long as said left child is not , and then rotate at the root’s right child. Once the tree is flat, continually rotate at either the left or the right child of the root until the tree is the same as that resultant from applying the above flattening procedure to . Finally, apply the reverse flattening procedure to recover . Cleary and Taback first derived this algorithm using group theory [17]. Our description is based on Lucas’ presentation [48].
Theorem 3.4**.**
For every pair of binary search trees and of size at least four with the same keys, there exists a request sequence such that Splay’s execution of the requests starting from has cost linear in and has final tree .
Proof.
Let be the sequence of keys at which Lucas’ restricted rotation algorithm performs rotations in order to transform into . Let and for let be the tree of lexicographically smallest postorder among four-node rooted hulls in that contain . (Using minimal postorder is just a convention.) Form by rotating at in and form by substituting for in . Form the key sequence by relabeling the keys in Figure 3 via the reverse index for and recording the sequence of keys in marked nodes on the path from to , excluding the key marked in . Splaying starting from results in final tree . The structure of the subtrees hanging from the path do not affect the transition tree of a splay operation. Thus, using splay operations to induce a restricted rotation in a four-node rooted hull in a larger tree also performs the restricted rotation in , and the request sequence induces Splay to successively enact the restricted rotations that transform into . Each of these rotations corresponds to at most thirteen requests in . Every access path in Splay’s execution of starting from has length at most four. Since , the total cost of this execution is at most . ∎
Theorem 3.5**.**
Splay is dynamically optimal if and only if it is approximately monotone.
Proof.
We prove Splay is coercible. Let be an execution for starting from comprising rooted hulls , transition trees , and after-trees . For initial trees of size three or less set . Otherwise, let where, for , is the request sequence constructed in Theorem 3.4 for inducing Splay to transform into . (If then is the singleton request sequence whose sole term is .) Splaying starting from induces a substitution of for in to form , where . A splay operation always places the requested key as the root of the after-tree. Since is the root of it must be the last key in . Therefore, is a subsequence of , and by Theorem 3.4 . Hence, is a simulation embedding for Splay and Splay is coercible. Apply Theorem 3.2. ∎
Splay is only approximately monotone. Let be a left spine with integer keys to for , let be the geometric sequence ascending in powers of two, let be the reversal of and let . Splaying the first half of efficiently brings the requested keys close to the root and ensures . Meanwhile, splaying each request in only halves the depth of the next requested key, so . Hence the limit of Splay’s subsequence overhead, as tree size increases, is at least two.
Our simulation embedding is designed for minimalism. A more careful analysis can reduce the constant factor. Two prior works construct simulation embeddings for binary search tree algorithms. Harmon builds a simulation embedding for the geometric version of Greedy [34, Chapter 2.3.4], while Russo’s simulation embedding for Splay uses rotation-based executions and potential-based analysis [58]. Neither work treats simulation embeddings as mathematical objects in their own right. Subsequent to the publication of our SODA paper [46], Chalermsook and Jiamjitrak constructed simulation embeddings for a class of template-like algorithms using potential-based methods [16]. Reddmann examines several algorithms’ competitive overheads numerically [56].
4 Startup Overhead
In principle, an algorithm may need to execute many requests in order to bring a poorly structured initial tree into a good state before it can behave optimally. Formally, an algorithm is eventually optimal if there exists a positive constant and startup overhead mapping starting trees to integers such that for all request sequences and corresponding initial trees . Similarly, if for all instances and subsequences of then is eventually monotone. (Eventual optimality implies eventual monotonicity.)
Some works do not distinguish between eventual and dynamic optimality, but Sleator and Tarjan were more optimistic. They made no allowance for startup overhead in their original statement of the dynamic optimality conjecture [61]. As we shall show, their optimism was well-placed: if Splay is eventually optimal then it is dynamically optimal. Our proof bounds startup overhead by averaging execution cost over many repetitions of a request sequence. A repeater for algorithm is a mapping from integer-instance pairs to request sequences for which there exists positive constants and such that and for all , request sequences , and starting trees . If a repeater exists, then is repeatable.
Theorem 4.1**.**
Eventually optimal repeatable binary search tree algorithms are dynamically optimal.
Proof.
Repeatability and eventual optimality imply positive constants , and and startup overhead such that for all request sequences , starting trees and . Choose to absorb the overhead and obtain . ∎
Theorem 4.2**.**
Eventually monotone repeatable coercible algorithms are dynamically optimal.
Proof.
These properties imply there exists a simulation embedding , constant and startup overhead for which for all instances and corresponding optimal executions . Apply Theorem 4.1. ∎
Theorem 4.3**.**
If Splay is eventually monotone then it is dynamically optimal.
Proof.
We show Splay is repeatable. Let be a request sequence with initial tree . Let be the final tree in Splay’s execution of starting from , and define the extended sequence , where is the sequence described in Theorem 3.4 that induces Splay to transform into . (If or then .) Denote by the sequence repeated times. Since merely consists of requests appended to , . The final tree in Splay’s execution of starting from is again , so each repetition has identical after-trees, and . Thus, .
It remains to bound the optimal cost. If let where is the after-tree for the first request in some optimal execution for starting from and is the sequence of transition trees in for the remaining requests in , otherwise let . Similarly, if let where is the first after-tree in Splay’s execution for starting from and is the sequence of transition trees in for the remaining requests in , otherwise let . The sequence of transition trees is an execution of starting from . The transition trees in have total size at most , and by Theorem 3.4 the transition trees in have total size at most . Since we can absorb initial tree size into optimal cost, . Finally, is an execution for starting from , meaning , and is a repeater for Splay. Apply Theorems 4.2 and 3.5. ∎
Our SODA paper established the contrapositive of Theorem 4.3 by repeating hypothetical instances on which Splay is non-optimal in order to contradict any presumed nontrivial startup overhead [46]. Kurt Mehlhorn kindly supplied us with an outline of the above version of the proof after he reviewed our manuscript.
5 Mutation
Monotonicity has no clear analog for algorithms that can handle requests to add and remove keys from the tree. We work around this obstacle by representing these operations using executions of instances that lack such requests. A reduction from algorithm in execution model to algorithm in execution model is a map from instances in to instances in for which there exists positive constants and such that and for all . If such a reduction exists we say reduces to .
Theorem 5.1**.**
If reduces to and is dynamically optimal then is dynamically optimal.
Proof.
The reduction to an optimal algorithm implies the existence of constants , and such that for all . ∎
Our reduction employs the following terminology. To augment a binary search tree with a new key , first do a search for in . When the search reaches a missing node, replace this node with a new node containing the key . Augmenting an empty tree makes the root key. (This process is sometimes called “leaf insertion.”) The successor of in is the smallest key in that is greater than . If no such key is present the successor is . The predecessor is defined symmetrically. The predecessor and successor are the neighbors of in , and the neighborhood of is the set comprising and those of its neighbors that are not missing. If has no children then its removal from is the rooted hull comprising every key in except for , unless is the root, in which case its removal forms the empty tree.
A mutating instance comprises a request sequence and an initial tree where each requested operation . An execution of this instance comprises a sequence of rooted hulls , transition trees , after-trees and . If then must be in and , and obey the same restrictions as instances without mutation. For , if then must not be in and , and fulfill a request to search for in the augmentation of with . If then must be in , contains the neighborhood of in , contains the neighbors of in as a rooted hull of its left spine so long as at least one neighbor is not missing, and results from substituting for in and then removing . The cost of is . We denote by the minimum cost among executions for starting from .
Requiring that executions incorporate both of a deleted key’s neighbors, when they are present, is essential to our analysis. We are unable to determine if algorithms that are dynamically optimal among executions in our model of deletion remain so after removing this requirement. Having stated this caveat, our nonstandard version of deletion does not change any known upper bound on the optimum cost of executing a mutating instance, that we are aware of, by more than a constant factor. We believe our model is sufficiently realistic to proceed without further concern.
Our extension of Splay inserts by augmenting with followed by splaying and deletes from by successively splaying the keys in the neighborhood of in in increasing order, rotating at the predecessor of if it is present, and then removing . Splay’s transition tree for the deletion is the rooted hull comprising the union of keys on the access paths of these operations in the tree immediately prior to the removal of .
Theorem 5.2**.**
If Splay is eventually monotone for instances without mutation then it is dynamically optimal for instances with mutation.
Proof.
We reduce Splay with mutation to Splay without mutation. Let and be the request sequence and starting tree of a mutating instance. Construct a new instance without mutation, as follows. Let be the set of keys in and form by relabelling the keys in to their respective indices. All nodes in are unmarked. (A node’s marking status merely aids in our construction and has no effect on algorithmic behavior.) For , let be the key whose index among the set of keys held by unmarked nodes in is the same as the index of the predecessor of in if a predecessor is present, otherwise set . Define analogously for the successor of if a successor is present, otherwise set .
If then let be the key whose index among those held by unmarked nodes in is the same as the index of in , let and set and .
If then do the following. If has marked nodes with keys strictly between and in symmetric order then set to the most recently marked among them and form by unmarking in . Otherwise, form by augmenting with a new unmarked node whose key is as follows. If neither nor are finite then ; if only is finite then ; if only is finite then ; otherwise, is the midpoint between and on the real line. Set and .
If then define as in the case for search. If at least one of and is non-finite then form by marking in . If neither nor is finite set ; if only is finite set ; otherwise, if only is finite set . When both and are finite, proceed as follows. Define to be the key in the most recently marked among the marked nodes of with keys strictly between and if such a node is present, otherwise is the midpoint between and on the real line. If then is as in the case when at least one of and is finite, otherwise form by augmenting with an unmarked node holding key and successively marking and then . Set . Splaying the first five of these requests ensures the keys comprise a rooted hull of the left spine, and splaying the final two requests induces successive rotations at and . Finally, set .
Define . Figure 4 depicts an example of this process. Let and be the transition trees and after-trees in Splay’s execution of starting from . Set and for let be the union of keys in the transition trees of Splay’s execution of starting from from and let be the final tree of this execution. If has unmarked nodes then their keys comprise a rooted hull in which is isomorphic to , and is isomorphic to a subgraph of the rooted hull in comprising the keys in . Thus, .
It remains to bound the cost of an optimal execution for the new instance. Let be the rooted hulls and be the transition trees for an optimal execution of starting from , and let . For , if or then let be the rooted hull of that is isomorphic to and let be the transition tree with the same keys as that is isomorphic to . If and then form and by respectively augmenting said isomorphic rooted hull and corresponding transition tree with . The first in the transition tree sequence for the requests in is the tree that results from splaying the first requested key of in . The remaining transition trees in are the transition trees of Splay’s execution of the remaining requests in starting from the first tree in . Finally, let be the final after-tree in the execution of starting from whose transition trees are . The first tree in has size at most , and has at most six remaining requests, each served by a transition tree from with size at most four, meaning . The transition tree sequence executes . Thus, , and is a reduction. Apply Theorems 5.1 and 4.3. ∎
Theorem 5.2 has an interesting consequence. A deque instance is an initial tree together with a request sequence entirely comprising , , and operations which respectively correspond to inserting a new maximum, deleting the maximum, inserting a new minimum and deleting the minimum. A binary search tree algorithm supports deque operations if there is some so that for all deque instances . The deque conjecture states that Splay supports deque operations. Tarjan proved that Splay supports a limited subset of deque operations [65]. Sundar placed an inverse-Ackermann upper bound on the cost of performing general deque operations [63]. Pettie later tightened this bound [55]. Similar bounds are known for Greedy [11]. We add a new result:
Theorem 5.3**.**
If Splay is eventually monotone without mutation then it supports deque operations.
Proof.
Let and be the request sequence and initial tree for a deque instance, and assume without loss of generality that the starting tree’s keys are integers. We construct after-trees for an execution of this instance, as follows. If then the root of is its minimum, assuming . Otherwise, if then the root of is its maximum, and if and then the root’s left child is the minimum key in the tree. The remaining keys in are in a subtree at the appropriate location in symmetric order with the following structure. The root of is the largest key less than or equal to the median key in , assuming . The left and right subtrees of are respectively right and left spines comprising the keys in that are less than and greater than . The initial transition tree of this execution has at most nodes and the remainder of the execution can be realized using transition trees each of size at most eight. Thus, . Apply Theorem 5.2. ∎
There are other ways to implement insertion and deletion. Sleator and Tarjan analyze an extension of Splay which supports mutation operations by using splits and joins [61]. Tarjan implements and by inserting at the top of the tree [65]. Zip trees insert and delete keys starting from the middle of the tree and rearrange descendants to restore the binary search tree invariants [66]. Common implementations of deletion in computers replace the deleted node with the node holding the predecessor or successor of the removed key, a technique originally devised by Hibbard [36]. Our version of deletion originates from Cole’s analysis of the “dynamic finger” theorem [18]. Its main advantage is enabling our proof of Theorem 5.2. We leave as open problems determining whether Tarjan’s implementation of deque operations or algorithms that use Hibbard’s variant of deletion are reducible to algorithms in our model of mutation.
6 Natural Algorithms
Our results readily generalize. We say an algorithm is natural if the rooted hulls of its executions are always the access paths for the requested keys and the transition trees for isomorphic rooted hulls are isomorphic. Splay is a natural algorithm. We incorporate mutation in the same way as for Splay. A natural algorithm inserts by searching in the augmented tree and deletes by successively searching in increasing order for all in the neighborhood of the deleted key, then rotating at the predecessor if present, followed by removing the key. To construct the transition digraph for natural algorithm , assign a vertex to every binary search tree with keys , and for every and add an arc from to the result of executing a search for in with .
Theorem 6.1**.**
A natural algorithm whose transition digraph is strongly connected for binary search trees of some size at least three is dynamically optimal for instances with mutation if and only if it is eventually monotone for instances without mutation.
Proof.
Let be a natural algorithm and let be the smallest integer greater than two for which is strongly connected. Choose a map from each pair to some request sequence whose execution by starting from has a sequence of after-trees which, when prefixed by , comprises a directed path of minimal length connecting to in . Let be the after-tree of transforming with rooted hull and transition tree . If or then define . Otherwise, choose rooted hulls and transition trees for enacting restricted rotations in an identical manner to Theorem 3.4, except that each transition tree has size , rather than size four, and set . The output of contains at most one request per vertex in . There are binary trees with keys [57]. (This is the Catalan number for .) Each access path contains at most every node in the tree. Thus, the cost for to execute starting from is at most , which is linear in the transition tree’s size. This execution’s final tree is whenever .
The map , where and are the rooted hulls and transition trees of some execution for starting from , is a simulation embedding for . Similarly, , where is the final tree in the execution of starting from by , is a repeater for . Thus, by Theorem 4.2, if is eventually monotone then it is dynamically optimal for instances without mutation. To reduce how executes mutation requests to its behavior when executing non-mutating instances, modify how the reduction in Theorem 5.2 handles requests to delete keys with both predecessors and successors. Instead of requesting a single auxiliary key, as in the case for Splay, add requests for auxiliary keys, augmenting the initial tree with unmarked nodes as necessary. Then, replace the requests that induce Splay to perform the relevant rotations along the left spine with the output of . Necessity of eventual monotonicity follows from Theorem 3.1. ∎
A strongly connected transition digraph is not necessary for a natural algorithm to be coercible. An example of this is a variant of Splay that carries out restructuring operations in tandem with the binary search for the requested key, eliminating the need for parent pointers, call stacks or threaded nodes [61]. A top-down-splay operation for in begins by initializing a pointer to a childless node whose key is , a pointer to a childless node with key , and a pointer to the root of . It uses left and right linking steps. Linking left replaces the left subtree of with , redirects to point to the target of , and redirects to point to its target’s left child. Linking right is symmetric. The operation repeats the following process until . Suppose without loss of generality that is in the subtree rooted at the left child of . (The other case is symmetric.) If then execute a right link. (This case is terminal.) Otherwise, if then rotate at , redirect to point to , and execute a right linking operation. Otherwise, execute a right link followed by a left link operation. Once , the operation completes by replacing the right subtree of with the left subtree of followed by replacing the left subtree of with the right subtree of , and doing symmetrically with , the left subtree of and the right subtree of . Mäkinen compares Splay to its top-down variant in detail [51].
Theorem 6.2**.**
Top-Down Splay’s transition digraph is not strongly connected for binary search trees of any size greater than two.
Proof.
Let be a binary search tree of size at least three whose root has right child , let be the left subtree of in and let for some . We show no request sequence induces Top-Down Splay to restore to . Suppose, for the sake of contradiction, that such a sequence exists. Because , the last key requested in any such sequence must be . Since , there must be at least one preceding request for a different key. Let be penultimate key in this sequence and let be after-tree corresponding to this request, so that . We demonstrate .
Suppose first that . Because , the left and right subtrees of in respectively contain and . Thus, is not on the access path for in . Because is the largest key on the access path to in and Top-Down Splay is a natural algorithm, the access path to in is a right spine rooted at , and is an ancestor of in this after-tree. This is incompatible with being the right child , as is the case in . Thus, . Since is the smallest key, it is on the left spine. Direct computation shows when has depth three or four, and a simple induction establishes the same for a left spine of any greater length. Hence, there is no path from to in Top-Down Splay’s transition digraph, and this transition digraph is not strongly connected. ∎
Theorem 6.3**.**
If Top-Down Splay is approximately monotone then it is dynamically optimal.
Proof.
Let be a nonempty binary search tree, let , let , and let be the successor of in if the successor is present, otherwise . Define the map that transforms any binary search tree with the same keys as in the following way. Form by replacing the left subtree of the parent of in with the right subtree of in if the parent is present, otherwise set . Form from by doing similar with , and form from by replacing the right subtree of the parent of with the left subtree of in , otherwise . The tree is the binary search tree whose left spine comprises such that is the right subtree of in .
Let be an execution for starting from with rooted hulls , transition trees and after-trees and let . Let , let be the final tree in Top-Down Splay’s execution of let starting from , let , let , let and let . For let , let and respectively be the smallest rooted hulls in and containing the keys in , and let be the sequence of keys determined by the restricted rotation algorithm for transforming the right subtree of in into its right subtree in . Form by replacing each key in with the corresponding keys determined by Figure 5, and then appending the requests . Top-Down Splay’s execution of has cost proportional to at most , which can be absorbed into the cost of . The sequence induces Top-Down Splay to transform into . Since , the transformation’s cost is proportional to . Thus, the map is a simulation embedding for Top-Down Splay. Apply Theorem 3.2. ∎
While the rooted hulls of Greedy’s executions are access paths, its transition trees are determined by which keys are in surrounding requests, meaning Greedy is not a natural algorithm. Lucas conjectured that restricting executions’ rooted hulls to the access path does not increase optimal cost by more than a fixed multiple [49]. This conjecture remains open. Kozma catalogues related open questions about the relative power of classes of binary search tree algorithms subject to various restrictions [42]. Splay is in the most restrictive of these classes, meaning dynamic optimality would imply they are all equivalent up to constant factors.
7 Crossing Cost
Binary search trees facilitate efficient search by arranging keys into many short access paths comprising children of alternating direction, and an algorithm’s efficiency depends critically on how it utilizes these arrangements. Consider the subtree of a binary search tree comprising the access path for a node in . The crossing nodes for in comprise , the root of , and the nodes in that are either left children with a right child on or right children with a left child on . We refer to the number of crossing nodes for as its crossing depth in , denoted . The bookkeeping nodes are the non-crossing nodes on . The crossing cost of execution with after-trees for starting from is where , and the execution’s bookkeeping cost is .
To perform a move-to-root operation, repeatedly rotate at the requested key until it becomes the root. The Move-to-Root algorithm enacts this process at each request. The crossing bound for starting from , denoted , is the crossing cost of Move-to-Root’s execution for this instance. The crossing bound is essentially equivalent to Wilber’s second lower bound on optimal execution cost [67]. Thus, the crossing bound never exceeds a fixed multiple of optimal execution cost. (See Appendix B.) We shall prove that the crossing bound is approximately monotone. Our techniques preview those required to show the same for Splay. We begin with three properties of Move-to-Root. The first describes the structure of its transition trees.
Theorem 7.1**.**
Executing transforms the access path to in into a tree whose root has left and right subtrees that are respectively right and left spines.
Proof.
By induction on the number of rotations involved in the operation. If the requested key lies at the root then the statement is trivial. Now suppose that the statement is true for nodes of depth , let and , and assume without loss of generality that solely comprises keys on the access path for in and that . (The other case is symmetric.) The first of the successive rotations at performed while executing replace the left subtree of in with . By the inductive hypothesis, the left and right subtrees of in comprise inward facing spines of the keys in that are respectively less than and greater than . The left subtree of remains unchanged after the final rotation, while the right subtree of immediately before the final rotation becomes the left subtree of immediately afterward, and the right subtree of after the final rotation is a left spine of keys greater than . Thus, the hypothesis holds for nodes at depth . ∎
The second property demonstrates Move-to-Root’s executions reflect temporal patterns in the request sequence. A binary tree is max-heap ordered if each node is assigned a priority from a totally ordered set and every non-root node’s priority is at most that of its parent’s. The standard priorities for a request sequence starting from are the mappings from keys to priorities such that, for , where is the time at which is requested in , and if and for .
Theorem 7.2**.**
Move-to-Root’s after-trees are max-heap ordered by the instance’s standard priorities.
Proof.
The initial tree is max-heap ordered with respect to the initial priorities: the root of the initial tree has highest priority, and the same holds recursively for its subtrees. We can see as follows that Move-to-Root restores the max-heap order invariant after each-request. Resetting the priority of the node holding requested key introduces a single heap order violation at the edge between and its parent, if the parent is present. After each rotation at that does not result in becoming the root, only a single edge in the tree violates the heap order, and that edge is always the one between and its parent. When becomes the root, it has the largest priority, and no other edges violate the heap order. ∎
The third property characterizes how Move-to-Root arranges keys in its after-trees. The left and right window boundaries and for a given key determined by a request sequence are respectively the largest key less than or equal to and the smallest key greater than or equal to in . The window subtree for determined by an execution of starting from with final tree is as follows. If neither nor are finite then ; if then ; if only is finite, or if both and are finite and the final request for precedes the final request for in , then is the right subtree of in ; otherwise, is the left subtree of in .
Theorem 7.3**.**
The window subtree for determined by Move-to-Root’s execution of starting from comprises the keys in strictly between the window boundaries for determined by .
Proof.
By induction on the number of requests. The initial tree and final tree are identical for executions of the empty sequence, so the statement is true when there are no requests. Now suppose the statement is true for sequences of up to requests and that for some , and assume without loss of generality that . Let and be the window boundaries for determined by , let be the set of keys in that are larger than and smaller than , let be the final tree in Move-to-Root’s execution of starting from , and let be the window subtree for determined by this execution. Define , , , and analogously for .
Consider first when so that and there are no rotations at any key in while moving to the root, and assume without loss of generality that is the right subtree of in . (The case when is the left subtree of is symmetric.) By the inductive hypothesis, is the set of keys in . If then the right subtree of in is the same as in since there are no rotations at while moving to the root of , and is the right subtree of in since is either infinite or more recently requested in than . If and is infinite then every key greater than is in its right subtree in and is on the right spine of , meaning the right subtree of in is the same as in and is again the right subtree of in . If and is finite then is the left subtree of in and is an ancestor and the successor of in , meaning the left subtree of in is since Move-to-Root is a natural algorithm. In all cases and the hypothesis holds for .
Consider now when . If then by construction. Otherwise, assume without loss of generality that so that . (The case when is symmetric.) Form by substituting for in and let be the right subtree of in . Since and have the same keys and , we may apply analysis of when . ∎
Our proof of approximate monotonicity examines how individual request removals affect the crossing bound. In particular, removing the first request in a sequence subtracts the first key’s crossing depth from the crossing cost of Move-to-Root’s execution of the remaining requests. The remaining requests are now executed starting from the original tree, rather than the tree resulting from moving the first requested key to the root. As the remainder of the altered execution proceeds, its after-trees become progressively similar to those of Move-to-Root’s execution of the original request sequence. The key to our argument is bounding the cost incurred by this restoration process.
Theorem 7.4**.**
.
Proof.
Let and be the final trees in Move-to-Root’s executions of starting respectively from and , let and be the window subtrees for determined by these executions, and set to be the crossing depth of in if is nonempty and zero otherwise. We show by induction on the number of requests that . The statement is true by construction for the empty sequence, so consider when for some . Define , , , and analogously for , let be the set of keys in contained in the symmetric order interval strictly between the window boundaries for determined by , and suppose .
Since it suffices to show that . Thus, we need to characterize the structure of the final trees. The tree is max-heap ordered with respect to a priority function that is identical to the standard priority function for starting tree except at , whose priority in we set by convention to zero. By Theorem 7.2, the after-trees of Move-to-Root’s executions of starting from and are max-heap ordered by priority functions defined recursively in the standard way starting respectively from the priority functions for and . Requests subsequent to have the same after-trees in both executions, so we may assume without loss of generality that and .
By Theorem 7.3, when the keys in comprise a rooted hull in and . (If then is empty.) These keys have the same priorities in both and , meaning the two rooted hulls are identical. The window subtree has the same keys as , none of which are requested in , meaning these keys have their initial priorities in and . The only key with differing priority in and is , which has maximal priority among keys in . Therefore, .
If then and , so we may assume without loss of generality. If set , otherwise set to be the subgraph in comprising the union of keys in with the window boundary for determined by of which the root of is a child in . Define analogously for in . The access path to in is identical to the access path for in whenever is nonempty, and the access paths to this key in and are identical. Thus, while and , meaning .
Let be the be the smallest rooted hull in containing the neighborhood of in . Since Move-to-Root is a natural algorithm, the subtrees hanging from are unaffected by executing , and so these subtrees are identically arranged in . Therefore, we may assume with no loss of generality that either or that the parent of in is defined and lies in .
Let be the first crossing nodes for in in increasing order of depth and let . Since and if is the root of , we may may assume without loss of generality that has a parent in . Thus, let be the child of in the same direction as with respect to its parent in and let be the child of in the opposite direction. Define where is the largest integer for which is an ancestor of in if and otherwise . Let be the indicator for , let be the indicator for , let be the indicator for , let be the indicator for and let be the indicator for . (An indicator’s value is one when its condition is true and zero otherwise.) By Theorem 7.1 and case analysis,
[TABLE]
where . (See Figure 6.)
Note that if then , and if then , and if or then . Combining these facts with some case analysis reveals that if then , and otherwise . If then and . Otherwise, we apply Theorem 7.1 to deduce the structure of the access path for in . If then is a subgraph of and . If is in the subtree rooted at then is a right spine ending at and if is in the subtree rooted at then is a left spine ending at . Therefore when is a strict descendant of , if and then since and the parent of in are on the same side of in symmetric order, and otherwise and . Otherwise, , in which case if then set to be the child of child in and otherwise set to be the parent of in . Let be a right spine comprising the strict ancestors of in that are less than and let be a left spine of the strict ancestors of in that are greater than , and let be the path from to in . If then results from attaching as the left subtree of the node with smallest key in , and if then then results from attaching as the right subtree of the node with largest key in . Thus, . In all cases, is at most . ∎
Theorem 7.5**.**
The crossing bound is approximately monotone.
Proof.
Let be the after-trees of Move-to-Root’s execution of starting from , and let be a sequence of request times. Set and for form by removing request from . We induct on to show that where , which suffices to establish that the crossing bound has subsequence overhead at most four. The statement is trivial when . Now suppose . Let and let and respectively be the first and final requests in , so that and . Let , so that and . Note that . Thus, , which by Theorem 7.4 is at most . Therefore, , and the hypothesis holds for request removals. ∎
Our presentation of the crossing bound is based on Iacono’s work [37, 38]. Move-to-Root, introduced by Allen and Munro [3], is the earliest example of a self-adjusting binary search tree algorithm. The crossing bound is not strictly monotone. For example, when , and . We had not realized this when writing our SODA paper, whose treatment of the crossing bound contains several mistakes [46].
8 The Way Forward
Our numerical experiments indicate that Splay’s cost never exceeds four times the sum of an instance’s crossing bound and initial tree size, which would imply dynamic optimality. Moreover, the crossing costs of Splay and Move-to-Root are so tightly coupled that the difference between them may well be at most linear in initial tree size. Additionally, the keys in the crossing nodes of these algorithms’ executions are quite similar, albeit sometimes offset from each other in symmetric order by a small amount. We have tried to prove these statements, to no avail. We believe our failures are not incidental, and that there are structural obstacles in the way of establishing dynamic optimality in this manner. The difficulty arises from temporal spread. Typically, about half of the keys in Move-to-Root’s crossing nodes for a given request appear on the access path for the corresponding request in Splay’s execution. A smaller fraction of these keys appear on the splay path for the next request, and the remaining keys are scattered across subsequent splay paths. The precise extent of this spreading is varied and depends on the particular request sequence.
Lucas remarked that optimal cost does not seem amenable to inductive analysis [49]. The observed temporal mixing is a manifestation of this problem, since it means that showing Splay’s cost obeys the crossing bound requires accounting for many of its preceding transition trees at each request. Fortunately, dynamic optimality’s equivalence to approximate monotonicity provides a means of shattering this barrier. Our conviction is:
Conjecture 8.1**.**
Splay’s crossing cost is approximately monotone.
Our proof of the crossing bound’s monotonicity is a natural starting point for tackling Conjecture 8.1. However, it requires a crucial modification. Our analysis of Move-to-Root establishes a worst-case bound on how its crossing cost increases after removing a request. By contrast, the increase in Splay’s crossing cost is not bound by any fixed multiple of the removed key’s crossing depth. For example, if results from splaying the largest key in a right spine comprising the integers , and , then the crossing costs of Splay’s executions of and when starting from are respectively and . Consequently, we must examine how request removals affect Splay’s crossing cost in aggregate, which is equivalent to reversing the order in which the proof of Theorem 7.5 inducts on request removals. This style of induction entails comparing executions of a request sequence starting from progressively divergent trees. The increased complexity of the required new approach is another manifestation of the barriers to standard inductive analysis of optimal algorithms. The advantage of attacking this manifestation of the problem is that Theorem 7.5 assures its achievability.
Adapting such a proof from Move-to-Root to Splay will almost certainly require a potential function in order to smooth out the effects of occasional requests whose removal produces a high increase in Splay’s crossing cost. Potential functions are tools for analyzing algorithms that have individual operations with high cost, but for which the cost per operation, amortized over all operations in a sequence, is low [64]. Each possible configuration of the data structure (e.g. the tree) is assigned a numerical value, called its potential. The cost of an operation is redefined to depend on both the original cost (e.g. the length of the Splay path), and on how the potential changes due to the operation’s effect on the data structure. If carefully constructed, the sum of the redefined costs over a sequence of operations will be an upper bound on the sum of the actual costs, yet no individual operation’s redefined cost will ever be very large. We need a potential that captures how Splay’s executions diverge from Move-to-Root’s.
Move-to-Root is a both progenitor and a sub-step of splaying. Move-to-Root splits the access path into a pair of spines. One can view Splay as comprising two phases: the first executes move-to-root, the second performs extra rotations, corresponding to the zig-zigs. (See Figure 7.) The extra rotations ensure that a splay operation decreases the depth of every node by about half of the number of its ancestors that were on the access path for the requested key [62].
Theorem 8.2** ([14, Proposition 17]).**
Executing is equivalent to starting from the tree and rotating at every key held by a child on the access path for in whose parent in is on the same side of in symmetric order for which is odd.
Proof.
By induction on the number of splay steps involved. The theorem is trivially satisfied when splaying at the root. Now suppose the statement is true for splay operations comprising splay steps and let be a node in whose splaying involves steps. Let be the subtree rooted at the ancestor of in whose depth is . Since the first splay steps each decrease the depth of by two, where . Denote by the procedure described in the theorem. Let , let be the left child of in and assume without loss of generality that . (The other case is symmetric.) If then only enacts a single rotation at , which is equivalent to the zig enacted by . If then and are on opposite sides of in symmetric order and rotates twice at , making it identical to the zig-zag step performed by . Finally, if then rotates twice at and then once at , which is equivalent the rotation at followed by in the zig-zig performed by . In all three cases, . Form by replacing with in . The edges rotated by subsequent to the operation are disjoint, and and have the same parity for every ancestor of in . Therefore, . By the inductive hypothesis , thus . ∎
Each zig-zig can create a violation of the max-heap ordering with respect to the standard priorities of an instance. As the executions of both Splay and Move-to-Root proceed, these zig-zigs will sometimes create further heap order violations. At other times, splay steps will remove some of the heap order violations. The correct potential for analyzing Splay’s crossing cost should in some way bound the rate at which splay operations generate heap order violations with respect to the standard priorities. We speculate on two possible forms. The first simply counts the number of edges in the tree being splayed that violate the heap-order condition with respect to most recent access time. This potential may be too “coarse,” in that it fails to capture heap order violations between nodes not immediately connected by an edge. If so, the likely way to address this shortcoming is weighting each node by some function of the difference between its crossing depth in the splayed tree and in the max-heap order maintained by Move-to-Root. While honing the details of the potential’s construction falls outside this work’s scope, we can infer something important up front.
A potential function’s design is closely tied to the extent to which its value can increase or decrease. By Theorem 4.3, if Splay is dynamically optimal then its startup overhead is no more than linear in initial tree size. Hence, any potential for proving Conjecture 8.1 should also have maximum value at most linear in the size of the starting tree. This considerably narrows the design space that we might otherwise need to explore.
To prove optimality we must also address Splay’s bookkeeping cost. Here again, we can glean insight from Splay’s progenitor. Move-to-Root is not dynamically optimal. For example, if is a left spine with keys , and then the cost of executing Move-to-Root on the subsequence is proportional to times the cost of its execution on the super-sequence . Because its crossing cost lower bounds a fixed multiple of optimal cost, Move-to-Root’s non-optimality arises from its bookkeeping cost. Splay tweaks Move-to-Root by breaking apart bookkeeping edges via zig-zig steps. Thus, Splay seems to be precisely the modification needed to make Move-to-Root optimal. We believe:
Conjecture 8.3**.**
Splay’s bookkeeping cost is at most a fixed multiple of the sum of its crossing cost and initial tree size.
Heuristically, splaying in a tree whose access paths comprise mostly bookkeeping nodes increases the average crossing depth of nodes in the tree, and the opposite phenomenon occurs in trees with many nodes of high crossing depth. Precisely tracking this exchange as Splay’s execution progresses quickly becomes unmanageable, indicating the need for an additional potential function that acts as a proxy for the number of bookkeeping nodes in the tree being splayed. It seems likely that a tree entirely comprising a spine should maximize this potential, and that a perfectly balanced binary search tree should minimize it. Conjectures 8.1 and 8.3 together imply dynamic optimality.
Appendix A Rotational Execution
Sleator and Tarjan [61], in their formulation of the dynamic optimality conjecture, use a rotation-based definition of a binary search tree execution. Given an initial tree and a request sequence, a rotational execution fulfills one request at a time, by performing a binary search for the requested key in the current tree, at a cost equal to the number of nodes on the access path. In addition, the execution can include any number of rotations before each request, at a cost of one per rotation. Formally, a rotational execution of starting from comprises a sequence of trees and search times where and results from rotating at some key in for . The cost to execute is . We denote the cost of an optimal rotational execution for this instance by . We shall prove that any transition tree execution can be simulated by a rotational execution of at most twice the cost, and vice-versa. Thus the two cost models are the same to within a factor of two.
Theorem A.1**.**
.
Proof.
First we observe that any transition tree execution can be simulated by a rotational execution at a cost of a factor of at most two. A -node binary search tree with keys can be transformed into any other binary search tree of the same set of keys by doing at most rotations [21, Theorem 2.1]. Hence each successive after-tree in the transition tree model can be produced from the previous one by doing at most rotations, where is the number of nodes in the rooted hull (and in the corresponding transition tree). Searching for the desired key after doing these rotations costs one. Hence if a transition tree execution fulfills a request with a transition tree of size , then a rotational executional execution can fulfill this request with cost at most .
Simulating a rotational execution by a transition tree execution is more complicated, because the former allows rotations to be done anywhere in the tree, not just in a rooted hull. The first step toward handling this is to view edges as retaining their identity throughout a rotational execution. Consider a rotation of a left child whose parent is ; let , , and be the left child of , the right child of , and the right child of , respectively. Let , and be the edges connecting , and with their parents, respectively. (If then and if is the root then .) Rotation at swaps the ends of and converts it from a left edge to a right edge, converts from a right edge to a left edge and changes its top end from to , and changes the bottom end of to . (Rotation at does not change edges.) The rotation affects no other edges, and it preserves the set of keys in the subtree rooted at any node other than and , and in particular those rooted at , , and . Right rotations behave symmetrically.
The second step is to modify the rotational execution so that whenever a key is searched for it is at the root of the tree. Before a search occurs, we first rotate on each edge of the access path, bottom-up, which moves the key to be searched for to the root; then we perform the search; then we do the inverse rotations in the opposite order, restoring the original access path. Fulfilling the request in this way costs if the original access path has nodes. Thus we increase the overall cost by at most a factor of two.
Finally, assume that a rotational execution moves each requested key to the root before searching for it. We simulate this rotational execution with a transition tree execution while at the same time postponing some rotations. Proceeding in the same order as keys are requested, we modify the subsequence of rotations before the first search, and each subsequence of rotations between successive searches, as follows. Let be such a subsequence, let be the tree in which these rotations begin, and let be the subgraph of comprising the edges in . We partition into pair of subsequences and . The subsequence comprises rotations in at edges in the same connected component of as the root of . (If no such rotations are present then is empty.) The subsequence is the complementary subsequence to in . We replace with unless is the subsequence of rotations for the final request, in which case we replace with and drop the remaining rotations. Then, we move the search time for the request to occur immediately after the final rotation in .
If is nonempty then its edges comprise a rooted hull in , and the rotations in transform this rooted hull into a tree on the same set of keys whose root contains the requested key. The transformed tree is the transition tree corresponding to the request in the transition tree execution. (If is empty then the transition tree comprises solely the root of .) If the rooted hull (and the transition tree) contain nodes, the number of rotations is at least , making the cost of these rotations plus the cost of the search at least in the rotational execution. The size of the corresponding transition tree is . We conclude that it is possible to simulate a rotational execution whose searches occur at the root with a transition tree execution of the same cost, and at most twice the cost for a general rotational execution. Creating simulations for optimal executions of each type establishes the result. ∎
Wilber was the first to restrict rotational executions to search only at the root [67]. The procedure for partitioning rotations is implicit in Lucas’ work [49]. Our description is based on Koumoutsos’ remarks [41]. Harmon was the first to describe binary search tree executions using transition trees [34].
Appendix B Wilber’s Lower Bound
We show that the crossing bound is at most a fixed multiple of optimum transition tree execution cost. Our proof proceeds in two main steps. First, we express a scoring procedure defined by Wilber in terms of the crossing bound. Then we use Wilber’s proof that this procedure lower bounds optimum rotational cost as a black box in our analysis to obtain the desired result. (Wilber’s proof is quite intricate, and we do not attempt to summarize it.) Unlike the crossing bound, Wilber’s scoring procedure depends only on the request sequence. Accounting for initial trees requires care.
Formally, Wilber’s bound for request sequence , denoted , is , where the score for each request is as follows. If then the score is zero. Otherwise, let and let . If set , otherwise set . Initialize and repeat the following process for as long as and there are keys requested prior to time lying between (inclusive) and (exclusive) in symmetric order. Set to the latest request time preceding for a key lying between (inclusive) and (exclusive) in symmetric order. Set to the key requested at . Set to the key closest in symmetric order to (exclusive) on the same side of in symmetric order as that is requested after and no later than . Finally, increment by one. The score is one less than the terminal value of . We respectively refer to and as the crossing keys and inside keys for the request. Wilber’s bound is nearly the same as the crossing bound for starting from the default tree comprising the keys in min-heap ordered by their first request times.
Theorem B.1**.**
* whenever .*
Proof.
By induction on the number and crossing depths of requests. Since the first request’s score is zero, Wilber’s bound is one for the singleton request sequence. Meanwhile, the first requested key lies at the root of the default tree for the request sequence and the root has crossing depth one. Thus, the formula holds for sequences containing a single request. Now suppose the theorem is true for all request sequences of length up to , let be a nonempty sequence of requests, let , let be the final after-tree in Move-to-Root’s execution of starting from , and set to be one if and zero otherwise. We show that the first crossing nodes for in , ordered increasing by depth, contain the crossing keys for request , and that the respective parents of these nodes contain the inside keys for the request. (If the zeroth inside key is we treat as the left subtree of this key, and otherwise as the right subtree of .)
The last key requested in is the first crossing key for request in . Meanwhile, by Theorem 7.2, the keys in comprise a rooted hull in max-heap ordered by their last request times in . In particular, the root of , which is the first crossing node for in , contains the first crossing key. Now suppose that the first crossing nodes for in contain the first crossing keys for request in for some , and that the parents of these nodes contain the first inside keys. Let and respectively be the deepest among the first and crossing nodes for in , let and be the respective parents of these nodes, and assume without loss of generality that . (The other case is symmetric.)
First consider when . Since is a descendant of in , the former’s final request time in precedes the latter’s. Because is in the right subtree of and either or contains in its right subtree, is greater than and at most . Every key in this interval is a descendant of , making the last among them requested in . Applying the inductive hypothesis that and respectively contain crossing key and inside key for request in establishes that contains crossing key . Since is both the parent of and the deepest node on the left spine of the subtree of rooted at which contains in its left subtree, it has the smallest key greater than whose final request comes after the last request for and no later than the last request for in . Thus, is inside key for request in . Furthermore, if then and there are no further crossing keys for request in .
Otherwise, if then , , and the subtree rooted at in contains every key that is greater than and at most . Since Move-to-Root is a natural algorithm and has no children in , the absense of in ensures that has no children in , making the only key in this interval. Thus, there are only crossing keys for request in when .
By the inductive hypothesis on request sequences of length , , and by the above arguments . Since and , the formula holds for request sequences of length . ∎
Theorem B.2**.**
.
Proof.
Let and and note that since contains every key in . Let and respectively be optimal executions for and starting from , let be the sequence comprising the first transition trees of , let be the sequence comprising the final transition trees in , and let be the after-tree for request in . The transition tree sequence is an execution for starting from with cost at most . By [47, Theorem 4], . Splay’s cost and initial tree size respectively upper bound and lower bound optimum cost, meaning and . Combining these inequalities establishes . By Theorem 7.2, is the final tree in Move-to-Root’s execution of starting from . Hence, . By Theorem B.1, and . Therefore, . Finally, by [67, Theorem 7] and Theorem A.1, . ∎
Acknowledgments
We thank Luís Russo for suggesting improvements to Figure 3, Kurt Mehlhorn for simplifying our proof of Theorem 4.3, Amit Halevi for comments that clarified the presentation of our execution model, and Siddhartha Sen and Bernard Chazelle for editorial feedback. The high-level presentation of Sections 3 and 4 benefited from informal discussions with Daniel Cooney. We are indebted to John Iacono for his guidance in understanding the equivalence between Wilber’s bound and Move-to-Root’s crossing nodes, along with corroborating our empirical comparisons between the behaviors of Splay and Wilber’s bound. Finally, we found David Galles’ “Data Structure Visualizations” website instrumental for prototyping our proofs [27]. Research at Princeton University partially supported by an innovation research grant from Princeton and a gift from Microsoft.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Georgy Adel’son-Vel’skii and Evgenii Landis “An algorithm for the organization of information” In Soviet Mathematics Doklady 3 , 1962, pp. 1259–1263
- 2[2] Yehuda Afek et al. “The CB tree: a practical concurrent self-adjusting search tree” In Distributed Computing 27.6 , 2014, pp. 393–417 DOI: 10.1007/s 00446-014-0229-0 · doi ↗
- 3[3] Brian Allen and Ian Munro “Self-organizing binary search trees” In Journal of the ACM 25.4 , 1978, pp. 526–535 DOI: 10.1145/322092.322094 · doi ↗
- 4[4] Arne Andersson “General balanced trees” In Journal of Algorithms 30.1 , 1999, pp. 1–18 DOI: 10.1006/jagm.1998.0967 · doi ↗
- 5[5] Rudolf Bayer and Edward Mc Creight “Organization and maintenance of large ordered indexes” In Acta Informatica 1.3 , 1972, pp. 173–189 DOI: 10.1007/bf 00288683 · doi ↗
- 6[6] Michael Bender, Martín Farach-Colton and William Kuszmaul “What does dynamic optimality mean in external memory?” In Innovations in Theoretical Computer Science Dagstuhl, Germany: Schloss Dagstuhl, 2022, pp. 1–23 DOI: 10.4230/LIPICS.ITCS.2022.18 · doi ↗
- 7[7] Benjamin Berendsohn and László Kozma “Splay trees on trees” In Symposium on Discrete Algorithms Alexandria, Virginia, USA: Society for Industrial Applied Mathematics, 2022, pp. 1875–1900 DOI: 10.1137/1.9781611977073.75 · doi ↗
- 8[8] Prosenjit Bose et al. “Competitive online search trees on trees” In Symposium on Discrete Algorithms Salt Lake City, Utah, USA: Society for Industrial Applied Mathematics, 2020, pp. 1878–1891 DOI: 10.1137/1.9781611975994.115 · doi ↗
