External Memory Planar Point Location with Fast Updates
John Iacono, Ben Karsin, Grigorios Koumoutsos

TL;DR
This paper introduces a new external memory data structure for dynamic planar point location that significantly improves update times while maintaining efficient query performance, especially for large datasets.
Contribution
It presents a novel data structure achieving faster amortized update times in the external memory model for dynamic planar point location with constant face size.
Findings
Update time reduced by a factor of B^{1-ε}
Query time remains polylogarithmic in N
Supports efficient vertical ray-shooting queries
Abstract
We study dynamic planar point location in the External Memory Model or Disk Access Model (DAM). Previous work in this model achieves polylog query and polylog amortized update time. We present a data structure with query time and amortized update time, where is the number of segments, the block size and is a small positive constant, under the assumption that all faces have constant size. This is a factor faster for updates than the fastest previous structure, and brings the cost of insertion and deletion down to subconstant amortized time for reasonable choices of and . Our structure solves the problem of vertical ray-shooting queries among a dynamic set of interior-disjoint line segments; this is well-known to solve dynamic planar point location for a connected subdivision of the plane…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\sidecaptionvpos
figuret
External Memory Planar Point Location with Fast Updates††thanks: This work was supported by the Fonds de la Recherche Scientifique-FNRS under Grant no MISU F 6001 1 and by NSF Grant CCF-1533564.
John Iacono Ben Karsin22footnotemark: 2 Grigorios Koumoutsos22footnotemark: 2
Université libre de Bruxelles, Belgium Université Libre de Bruxelles. {johniacono,bkarsin,gregkoumoutsos}@gmail.comNew York University, USA.
Abstract
We study dynamic planar point location in the External Memory Model or Disk Access Model (DAM). Previous work in this model achieves polylog query and polylog amortized update time. We present a data structure with query time and amortized update time, where is the number of segments, the block size and is a small positive constant, under the assumption that all faces have constant size. This is a factor faster for updates than the fastest previous structure, and brings the cost of insertion and deletion down to subconstant amortized time for reasonable choices of and . Our structure solves the problem of vertical ray-shooting queries among a dynamic set of interior-disjoint line segments; this is well-known to solve dynamic planar point location for a connected subdivision of the plane with faces of constant size.
1 Introduction
The dynamic planar point location problem is one of the most fundamental and extensively studied problems in geometric data structures, and is defined as follows: We are given a connected planar polygonal subdivision with edges. For any given query point , the goal is to find the face of that contains , subject to insertions and deletions of edges. Here we focus on subdivisions such that each face has constant number of edges. An equivalent formulation, which we use here is as follows: given a set of interior-disjoint line segments in the plane, for any given query point , report the first line segment in that a vertical upwards-facing ray from intersects, subject to insertions and deletions of segments.
Dynamic planar point location has many applications in spatial databases, geographic information systems (GIS), computer graphics, etc. Moreover it is a natural generalization of the dynamic dictionary problem with predecessor queries; this problem can be seen as the one dimensional variant of planar point location.
In this paper we focus on the External Memory model, also known as the Disk Access Model (DAM) [2]. The DAM is the standard method of designing algorithms that efficiently execute on large datasets stored in secondary storage. This model assumes a two-level memory hierarchy, called disk and internal memory and it is parameterized by values and ; the disk is partitioned into blocks of size , of which can be stored in memory at any given moment. The cost of an algorithm in the DAM is the number of block transfers between memory and disk, called Input-Output operations (I/Os). The quintessential DAM-model data structure is the B-Tree [11]. See [25, 26] for surveys. Many applications of dynamic planar point location, such as GIS problems, must efficiently process datasets that are too massive to fit in internal memory, thus it is of great relevance and interest to consider the problem in the DAM and to devise I/O efficient algorithms.
1.1 Previous Work
RAM Model.
In the RAM model (the leading model for applications where all data fit in the internal memory) the dynamic planar point location problem has been extensively studied [4, 10, 19, 18, 15, 21]. It is a major and long-standing open problem in computational geometry to design a data structure that supports queries and updates in time [16, 17, 24], i.e., to achieve the same bounds as for the dynamic dictionary problem. In a recent breakthrough, Chan and Nekrich in FOCS’15 [15] presented a data structure supporting queries in time and updates in time. They also showed the tradeoff of supporting queries in time and updates in time or vice-versa for .
Recently Oh and Ahn [23] presented the first data structure for a more general setting where the polygonal subdivision is not necessarily connected; their data structure supports queries in time and updates in amortized time.
External Memory model
(See Table 1). Several data structures have been presented over the years which support queries and updates in polylog() I/Os[1, 7, 5]. Table 1 contains a list of results of prior work. The best update bound known is by Arge, Brodal and Rao [5] and achieves amortized I/Os. The query time of their data structure is . Very recently, the first data structure that supports queries in I/Os was announced by Munro and Nekrich [22]. In particular they support queries in I/Os. However their update time is slightly worse than logarithmic, . In all those works the bounds are obtained by solving the problem of vertical ray-shooting.
Fast Updates in External Memory.
One of the most intriguing and practically relevant features of the external memory model is that it allows fast updates. For the dynamic dictionary problem with predecessor queries, the optimal update bound in the RAM model is . In external memory, however, -trees achieve the optimal query time of and typical update time of , although substantially faster update times are possible. Brodal and Fagerberg [14] showed that amortized I/Os per update can be supported, for small positive constant, , while retaining -time queries; they further showed that this is an asymptotically optimal tradeoff between updates and queries. Observe that this update bound is a huge speedup from and that for reasonable choices of parameters, e.g. , , , this yields a subconstant amortized number of I/Os per update. A similar update bound was later achieved for other dynamic problems like three-sided range reporting and top- queries [13].
Given this progress and the fact that in the RAM model the bounds achieved for planar point location and the dictionary problem are believed to coincide, it is natural to conjecture that a similar update bound can be achieved for the dynamic planar point location problem. However, to date no result has been presented that achieves sublogarithmic insertion or deletion time.
1.2 Our Results
We consider the dynamic planar point location problem in the external memory model and present the first data structure with sublogarithmic amortized update time of I/Os. Prior to our work, the best update bound for both insertions and deletions was , achieved by Arge et al. [5]. Our main result is:
Theorem 1.1** (Main result).**
For any constant , there exists a data structure which uses space, answers planar point location queries for polygonal subdivisions with faces of constant size in I/Os and supports insertions and deletions in amortized I/Os. The data structure can be constructed in I/Os.
To obtain this result, several techniques are used. Our primary data structure is an augmented interval tree [20]. We combine both the primary interval tree and two auxiliary structures described below with the buffering technique [14, 3] to improve insertion and deletion bounds. In Section 2 we prove Theorem 1.1 using our auxiliary structures as black boxes and omit some technical details relating to rebuilding; these details are deferred to Section 5.
Similarly to previous work, we focus on solving the problem of vertical ray-shooting queries. Our first auxiliary structure answers vertical ray-shooting queries among non-intersecting segments whose right (left) endpoints lie on the same vertical line. This is called the left (right) structure (in Section 2 it will be clear why we choose this terminology and not vice-versa). Left/Right structures of Agarwal et al. [1], which support queries and updates in I/Os, are used by several prior works [1, 7, 5]. Our structure improves on their result by reducing the update bound by a factor of . We obtain the following result, the proof of which is the topic of Section 3:
Theorem 1.2** (Left/right structure).**
For a set of non-intersecting segments whose right (left) endpoints lie in the same vertical line and any constant , we can create a data structure which supports vertical ray-shooting queries in I/Os and insertions and deletions in amortized I/Os. This data structure uses space and it can be constructed in I/Os. If the segments are already sorted, it can be constructed in I/Os.
Our second auxiliary structure answers vertical ray-shooting queries among non-intersecting segments whose endpoints lie in a set of vertical lines. These vertical lines define vertical slabs, hence the structure is called a multislab structure. We obtain the following result, the proof of which is the topic of Section 4:
Theorem 1.3** (Multislab structure).**
For any constant and set of non-intersecting segments whose endpoints lie in vertical lines, we can create a data structure which supports vertical ray-shooting queries in I/Os and insertions and deletions in amortized I/Os. This data structure uses space and it can be constructed in I/Os. If the segments are already sorted according to a total order, it can be constructed in I/Os.
A major challenge faced by previous multislab structures is how to efficiently support insertions. At a high-level, it is hard to deal with insertions in cases where a total order is maintained: each time a new segment gets inserted we need to determine its position in the total order, which cannot be done quickly. Arge and Vitter [7] developed a deletion-only multislab data structure and then used the so-called logarithmic method [12] which allowed them to handle insertions in I/Os. Later Arge, Brodal and Rao [5] developed a more complicated multislab structure supporting insertions in amortized I/Os by performing separate case analysis depending on the value of .
Here, we support insertions in a much simpler way by breaking each inserted segment into smaller unit segments whose endpoints lie on two consecutive vertical lines and can be compared easily to the segments already stored. This way, we are able to support insertions easily in I/Os. Finally, we add buffering and obtain sublogarithmic update bounds.
1.3 Notation and Preliminaries
External Memory Model.
Throughout this paper we focus on the external memory model of computation. denotes the number of segments in the planar subdivision, the block size and the number of elements that fit in internal memory. We assume that and (the tall cache assumption). It is well-known that sorting elements requires I/Os [2]. Given that , this bound is . We use this bound for sorting in many places without further explanation.
Ray-shooting Queries.
In the rest of this paper, we focus on answering vertical ray-shooting queries in a dynamic set of non-intersecting line segments. Let be the set of segments of the polygonal subdivision . Given a query point , the answer to a vertical ray-shooting query is the the first segment of hit by a vertical ray emanating from a query point in the direction. Based on standard techniques (see e.g. [7]), for connected polygonal subdivisions with faces of size , a planar point location query for a point can be answered in I/Os after answering a vertical ray-shooting query for .
-Trees.
All tree structures that we will use are variants of the -Trees [14] which are -trees except that the internal nodes have at most (and not ) children; the leaves still store data items. For constant , this does not change the asymptotic height of the tree or the search cost, both remain .
2 Overall Structure
In this Section we prove Theorem 1.1, using the data structures of Theorems 1.2 and 1.3 (detailed in Sections 3 and 4, respectively). Given non-intersecting segments in the plane and a constant , we construct a -space data structure which answers vertical ray-shooting queries in I/Os and supports updates in amortized I/Os. Throughout this section we let .
The Data Structure.
As in the previous works on planar point location, our primary data structure is based on the interval tree (the external interval tree defined in [9]). Our interval tree is a -tree which stores the -coordinates of segment endpoints in its leaves. Here we assume for clarity of presentation that the interval tree is static, i.e. all new segments inserted share -coordinates with already stored segments; in Section 5 we remove this assumption and extend our data structure to accommodate new -coordinates and achieve the bounds of Theorem 1.1.
Each node of is associated with several secondary structures, as we explain later, and each segment is stored in the secondary structures of exactly one node of . Each node of is associated with a vertical slab . The slab of the root is the whole plane. For an internal node , the slab is divided into vertical slabs corresponding to the children of , separated by vertical lines called slab boundaries, such that each slab contains the same number of vertices of from slab .
Let be the set of segments that compose . Each segment is assigned to a node of . This is the highest node of such that is completely contained in slab and intersects at least one slab boundary partitioning ; if such an internal node does not exist, then is assigned to a leaf such that is completely contained in its slab . Segments assigned to internal nodes are stored in the secondary structures of those nodes, whereas segments assigned to leaves are stored explicitly in the corresponding leaf. By construction of the slab boundaries, each leaf stores segments in blocks.
Consider a segment assigned to a node of . Let and be the children slabs of where the left and right endpoints of lie. We call the segment the left subsegment of , the segment the right subsegment of and the rest of (which spans children slabs ) is its middle subsegment. See Figure 1 for an illustration. In this example, the left subsegment is , the right subsegment is , and the portion of in and is the middle subsegment.
Let be the set of segments assigned to a node of . To store segments of , node of contains the following secondary structures:
A multislab structure which stores the set of middle segments. 2. 2.
left structures , for , storing the left (sub)segments of slab . 3. 3.
right structures , for , storing the right (sub)segments of slab .
In addition, each internal node contains an insertion buffer and deletion buffer , each storing up to segments.
Construction and Space Usage.
For every node , the buffers and fit in blocks, since they store at most segments. By Theorems 1.2 and 1.3, a secondary structure storing segments uses space. Since each segment of is stored in at most 3 secondary structures, overall secondary structures of use space. Thus each node uses space. We get that our data structure uses overall space. The interval tree can be constructed in I/Os. This can be done by sorting the segments by their endpoints’ -coordinates and then determining all slab boundaries to create a balanced interval tree. By Theorems 1.2 and 1.3, all secondary structures of a node of can be constructed in I/Os . Thus, all secondary structures of the tree can be constructed in I/Os.
Queries.
To answer a vertical ray-shooting query for a point , we traverse the root-to-leaf path of based on the -coordinate of , while maintaining a segment (initialized to null) which is the answer to the query among segments assigned to nodes we have traversed so far. At each node visited along this path, we first update buffers and by removing from both of them all segments (if any) of . Then, we perform a vertical ray-shooting on the secondary structures of ; in particular we ray-shoot on the multislab structure and the left and right structures and , for such that the query point is in slab 111Minor detail: For each secondary structure considered, we first perform insertions/deletions of the corresponding segments from buffers and .. After checking the secondary structures, we update if a closer segment above is found as a result. Next, we ray-shoot among segments stored in and update if necessary. Finally, we determine which child of to visit, and flush any segments of that are contained in the slab of to ; this way we make sure that information about deleted segments is updated throughout the root-to-leaf path and no deleted segment can be considered as an answer to the query. We then continue the process at . Once a leaf node is reached, we simply compare the segments it contains with and return the closest segment above among them and .
Bounding the query cost: Since any root-to-leaf path of has length , each secondary data structure supports ray-shooting queries in I/Os (due to Theorems 1.2 and 1.3) and we check secondary structures per node, we get that a query is answered in I/Os. Note that in each node of the root-to-leaf path visited, the operations involving and require I/Os, thus they increase the total cost by at most a factor.
Insertions.
To handle insertions, we use the insertion buffers stored in nodes of . When a new segment is inserted, we insert it in the insertion buffer of the root. Let be an internal node with children . Whenever becomes full, it is flushed. Segments of that cross at least one slab boundary partitioning are inserted in the secondary structures of ; segments that are contained in the slab of are inserted in , for . In case becomes full for some node whose children are leaves, we insert those segments explicitly at the corresponding leaves. When a leaf becomes full, we restructure the tree using split operations on full nodes.
Bounding the insertion cost: We compute the amortized cost of an insertion by considering three components:
- (i)
The cost for moving segments between insertion buffers. Whenever an insertion buffer gets full, it forwards segments to the buffers of its children performing I/Os. Since a flushing occurs every insertions in , the amortized cost of such operations is . Each segment will move in at most insertion buffers before it is inserted in the secondary structures of a node (or in a leaf). Thus the amortized cost for moving between buffers is . 2. (ii)
The insertion cost in the secondary structures. By Theorems 1.2 and 1.3 we get that insertions in secondary structures require I/Os. 3. (iii)
The cost of restructuring the tree after insertions when a leaf becomes full. We show in Section 5 that the restructuring requires O\big{(}\frac{\log_{B}N}{\epsilon^{\prime}\cdot B^{1-\epsilon^{\prime}}}\big{)} amortized I/Os, by slightly modifying our primary interval tree data structure.
We conclude that our data structure supports insertions in amortized I/Os.
Deletions.
To support deletions, we use the deletion buffers stored in all nodes of . To delete a segment , we first check whether is in the insertion buffer of the root and in that case we delete it; otherwise we store it in . Similar to insertions, whenever gets full for some internal node with children , we flush . The segments of crossing at least one slab boundary partitioning are deleted from the corresponding secondary structures associated with ; the other segments of are moved to buffers ; in case a segment inserted in , we delete it from both buffers. In case becomes full for some parent of leaves, we delete those segments explicitly from the corresponding leaves.
Bounding the deletion cost: The deletion cost has three components:
- (i)
Moving segments between the deletion buffers. Using the same argument as for insertions, we get that this requires I/Os, amortized. 2. (ii)
The cost of deletion in the secondary structures. By Theorems 1.2 and 1.3 we get that deletions in secondary structures require amortized I/Os. 3. (iii)
The cost of restructuring the tree. Every deletions, we rebuild the structure using I/Os, to get and amortized restructuring cost of I/Os.
Overall deletions are supported in amortized I/Os.
3 Left and Right Structures
In this section we prove Theorem 1.2. Given points all of whose right (left) endpoints lie on a single vertical line, we construct a data structure which answers vertical ray-shooting queries on those segments in I/Os and supports insertions and deletions in amortized I/Os for a constant .
We describe the structure for the case where we are given a set of segments whose right endpoints have the same -coordinate (left structure)222Recall from Section 2 that we call left structures the ones storing the left subsegment of a segment , thus all subsegments stored in a left structure have the same -coordinate of right endpoints.. The case where the left endpoints of the segments have the same -coordinate (right structure) is completely symmetric. For a segment , we will refer to the -coordinate of its right endpoint as the -coordinate of . Conversely we define the -coordinate of to be the -coordinate of its left endpoint.
Total Order.
We assume that the segments in are ordered according to their -coordinates. We can always order the segments according to this total order in I/Os.
The Data Structure.
We store all segments of in an augmented -tree which supports vertical ray-shooting queries, insertions and deletions. The degree of each node is between and , except the root which might have degree in the range , and leaves store elements. For a node , let be the subtree rooted at . Since the segments are sorted according to their -coordinates, each subtree corresponds to a range of -coordinates, which we call the -range of node . Let be an internal node of with children . Node stores the following information:
A buffer of segments of capacity which contains segments in the -range of whose left endpoints have the smallest -coordinates (i.e., segments that extend the farthest from the vertical line) and are not stored in any buffer for an ancestor of . In other words, together with segments of buffers form an external memory priority search tree [6]. 2. 2.
An insertion buffer and a deletion buffer , each storing up to segments. 3. 3.
A list that contains, for each child , the segment with minimum -coordinate stored in . We call this the minimal segment for child .
The data structure satisfies the following invariants: For each node , either or if , then and are empty and all buffers stored in descendants are empty. Also, for each node , buffers and are disjoint. Finally, for a leaf , and are empty.
Construction and Space Usage.
Overall buffers and lists of each node contain segments, i.e. they can be stored in blocks. Thus can be stored in blocks, i.e. it requires space. Construction of requires I/Os, since we need to sort all segments according to their -coordinates. If the segments are already sorted according to their -coordinate, then can be created in I/Os.
Queries in the static structure.
To get a feel for how our structure supports queries, we first show how to perform queries in the static case, i.e., assuming there are no insertions and deletions and all buffers and are empty. Later we will give a precise description of performing queries in the fully dynamic structure.
Let be the ray emanating from in the direction and the ray emanating from in the direction. We query the structure by finding the first segment hit by both and . We keep two pointers, and , initialized at the root. We also keep the closest segments and seen so far in the and direction respectively (initialized to and ). At each step, we update both and to move from a node of depth to a node of depth . While at level , and might coincide, or one of them might be undefined (set to null).
We now describe the query algorithm. We start at the root of and advance down, while updating , , and ,. When at depth , we find the first segment hit by among and and update if necessary (i.e. if is the first segment hit by among all segments seen so far). Similarly, we ray-shoot on among and and update if necessary. To determine in which nodes of depth to continue the search, we ray-shoot on among and and also ray-shoot on among and (i.e., all minimal segments of children of and ). Let be the first segment in hit by (if such a segment exists) and be the node containing (if exists). If the -range of is higher than the -coordinate of or if does not exist, we leave undefined for level . Otherwise, we set . Similarly, call the first minimal segment of hit by and be the node containing (if such a segment exists). If the -range of is lower than the -coordinate of or if does not exist, we leave undefined for level . Otherwise we set .
If both and are undefined for the next level , we stop the procedure and output as the result to the vertical ray-shooting query. Otherwise we repeat the same procedure in the next level. When we reach a leaf level, we find the first segment hit by among and , update if necessary, and output as the result of the query.
Remark: The reader might wonder why we answer vertical ray-shooting queries in both directions and keep two pointers and . Isn’t it sufficient to answer queries in one direction and keep one pointer at each step? Figure 2 shows an example where this is not true and maintaining only the pointer would result in an incorrect answer.
The formal proof of correctness of this query algorithm is deferred to Appendix A.
Bounding the query cost: To count the cost, observe that in each step we move down the tree by one level and perform operations that require I/Os, as we check segments stored in the current nodes and . Since the height of the tree is , a query is answered in I/Os.
Insertions.
Assume we want to insert a segment into the left structure . If the -value of is smaller than the maximum -value of a segment stored in the buffer of the root , we insert into . Otherwise we store in the insertion buffer of the root . Note that insertion of in might cause to overflow (i.e., ); in that case we move the segment of with the maximum -value into the insertion buffer of the root .
Let be an internal node with children . Whenever the insertion buffer becomes full, we flush it, moving the segments to buffers of the corresponding children. For a segment that should be stored in child , we repeat the same procedure as in the root: Check whether has smaller -value than the maximum -value of a segment stored in and if yes, store in , otherwise store it in . If overflows, we move its last segment (i.e. the one with maximum -value) into . Also, if gets stored in and its -value is smaller than all previous segments of , we update the minimal segment of , .
When overflows for some leaf , we split into two leaves and , as in standard -trees. Note that this might cause recursive splits of nodes at greater height.
Bounding the insertion cost: To flush a buffer and forward segments to buffers and , for we perform I/Os. Since becomes full after at least insertions, the amortized cost of moving a segment from to buffers of a child of is . Each inserted segment moves between buffers in a root-to-leaf path of length , thus the total amortized cost for moves between buffers is I/Os. The restructuring of due to splitting nodes requires amortized I/Os, as in standard B-trees. Thus, insertions are supported in amortized I/Os.
Deletions.
To delete a segment , we first check whether it is stored in the buffers of the root or ; in this case we delete it. Otherwise, we insert in the deletion buffer of the root .
Let be an internal node with children . Whenever becomes full we flush it and move the segments to the corresponding children and repeat the same procedure: For a segment which moves to child , we check whether it is stored in or : if yes, we delete it and update the minimal segment of in if necessary. Otherwise, we store in the deletion buffer . If segment buffer underflows (i.e., ), we refill it using segments stored in buffers ; the segments moved to are deleted from and all necessary updates in are performed. This might cause underflowing segment buffers for children of ; we handle those in the same way. In case all buffers become empty and , we move the segments from to until either or .
Bounding the deletion cost: Deletion cost consists of three components:
- (i)
Cost for moving segments between buffers: Using the same analysis as for insertions we get that this requires amortized I/Os. 2. (ii)
Cost due to refilling of buffers : For a node with children , while refilling buffer from we perform I/Os and we move segments one level higher. Thus the amortized cost of moving a segment up by one level is . Since the tree has height , over a sequence of deletions the total number of moves of segments by one level is . Thus the total cost due to refilling is at most , which implies that the amortized cost is .
A corner case that we did not take into account above is when the total number of segments stored in buffers are less than . In this case it is not valid that the amortized cost of updating is . To take care of this, we use a simple amortization trick: we double charge all I/Os performed relating to insertions. This way, for each buffer there is a saved I/O from the time when segments move from to node . We use this additional saved I/O when gets emptied due to the refilling of . 3. (iii)
Restructuring requires amortized I/Os, by rebuilding the structure after deletions.
Overall, the amortized deletion cost is I/Os.
Queries in the dynamic structure.
We now describe how to extend our query algorithm to the dynamic case. In order to ensure that all nodes visited are up-to-date and we do not miss any updates in the insertion/deletion buffers, when moving a pointer from a node to its child , we flush any deletes in to , i.e. delete segments of that are stored in , store the other segments in and update if necessary. We then delete any segments found in both and . Finally, we compare segments in with (recall this is the first segment hit by among segments considered so far) and, if any segment in would be hit by before we replace with it. Clearly this increases the total cost by at most a factor compared to the static case, thus the query cost is I/Os.
4 Multislab Structure
In this section we prove Theorem 1.3. Assume that we are given a set of non-intersecting segments with endpoints on at most vertical lines , for some constant . We show that those segments can be stored in a data structure which uses space, supports vertical ray-shooting queries in I/Os, and updates in amortized I/Os, for . This data structure can be constructed in I/Os. We call this data structure a multislab structure.
For notational convenience we set . This way endpoints of the segments lie on at most vertical lines . For , let denote the vertical slab defined by vertical lines and . We will show that queries are supported in I/Os and updates in I/Os. Theorem 1.3 then follows.
Total Order.
In order to implement the multislab structure we need to maintain an ordering of the segments based on their -coordinates. Using standard approaches (see e.g. [7, 5]) we can define a partial order for segments that can be intersected by a vertical line. Arge et. al. [8] showed how to extend a partial order into a total order on segments (not necessarily all intersecting the same vertical line) in I/Os. We use this total order to create our multislab structure.
The Data Structure.
We store the ordered segments in an augmented B-tree which supports queries, insertions and deletions. The degree of each node is between and , except the root which might have degree in the range . Leaves store elements. For a node , let be the subtree rooted at . Let be the children of an internal node . Node stores the following information:
A buffer of capacity which contains the highest (according to the total order) segments stored in which are not stored in any buffer for an ancestor of . In other words, together with segments of buffers form an external memory priority search tree [6]. 2. 2.
An insertion buffer and a deletion buffer , both storing up to segments. 3. 3.
A list which contains, for each slab , , and each child , , the highest segment (according to the total order) crossing slab stored in .
The data structure satisfies the following invariants: i) for each node , either or if , then and are empty and all buffers of descendants of are empty, ii) for each node , buffers and are disjoint, and iii) for every leaf , and are empty.
Construction and Space Usage.
Overall buffers of each node contain segments and list contains at most segments, i.e., they can be stored in blocks. Thus can be stored in blocks, i.e. it requires space. The structure can be constructed in I/Os. If segments are already sorted according to a total order, construction requires I/Os.
Insertions.
To insert a new segment we need to determine its position in the total order. Clearly, we can not afford to produce a new total order from scratch, as this costs I/Os. Thus, we break into at most unit segments, where each segment crosses exactly one slab. In particular, if crosses slabs , we break it into unit segments , where segment crosses slab . We call all such unit segments stored in new segments. The rest of the segments stored in are called the old segments of . Now we can easily update the total order: segment needs to be compared only with segments crossing slab ; if and are the predecessor and successor of within slab , we locate in an arbitrary position between and in the total order. This way a valid total order is always maintained.
We now describe the insertion algorithm. When segment needs to be inserted, we first break it into unit segments . For each segment , , we first check whether it should be inserted in the buffer of the root: if this is the case we store it there; otherwise we store it in the insertion buffer of the root . In case overflows (i.e. ) we move its last segment (according to the total order) to . Let be an internal node with children . Each time becomes full, we flush it and move the segments to its children , for . For a segment moving from to , we first check whether it is greater (according to the total order) than the minimum segment stored in and if so we store it in ; otherwise we store it in buffer . In case overflows (i.e. ) we move its last segment to . Also we update information in list if necessary. In case becomes full, we repeat the same procedure recursively.
When overflows for some leaf , we split into two leaves and , as in standard -trees. Note that this might cause recursive splits of nodes at greater height.
Bounding the insertion cost: To flush a buffer and move segments to buffers of child nodes and , we need to perform I/Os. Since each segment breaks into at most unit segments, a buffer of size becomes full after at least insertions. Thus the amortized cost of moving a segment from a buffer of depth to depth is . Since each segment will be eventually stored in a node of depth , the amortized cost until it gets inserted is . The restructuring of due to splitting full nodes requires amortized I/Os, as in standard B-trees. Overall insertions require amortized I/Os.
Linear space usage: To avoid increases in space usage due to unit segments, whenever there are new segments, we rebuild the structure. This way the space used is . This rebuilding requires I/Os, i.e., amortized I/Os, thus it does not violate the insertion time bound.
Deletions.
The process of deleting a segment, , is similar to insertion: we break into at most unit segments where and are the leftmost and rightmost slabs spanned by and apply the deletion procedure for each of those unit segments separately.
The deletion algorithm for a unit segment is analogous to the one of the left (right) structure of Section 3. For completeness we describe it here. To delete a unit segment , we first check whether it is stored in the buffers of the root or ; in this case we delete it. Otherwise, we insert in the deletion buffer of the root . Let be an internal node with children . Whenever becomes full we flush it and forward the segments to the corresponding children and repeat the same procedure: For a segment which moves to child , we check whether it is stored in or and if this is the case, we delete it and update list if necessary. Otherwise, we store in the deletion buffer .
In case segment buffer underflows (i.e., ), we refill it using segments from buffers ; segments moved to are deleted from and gets updated (if needed). This might cause underflowing segment buffers ; we handle those in the same way. In case all buffers become empty and , we move to the segments from until either or . After deletions we rebuild our data structure.
Remark: Note that here we split all segments into unit segments . However, the old segments are not unit segments and are stored manually in the data structure. However this does not affect our algorithm: whenever the first unit segment which is a part of reaches the node such that , we delete from and remove from deletion buffers. The remaining segments will eventually reach node and realize that is already deleted from ; at this point gets deleted.
Bounding the deletion cost: The analysis of the deletion cost is identical to the analysis of deletions in the structure of Section 3. Since each segment breaks into at most unit segments, we get an amortized deletion cost of .
Linear space usage: Similar to insertions, we need to make sure that the total space used is not increasing asymptotically due to the use of at most unit segments in deletion buffers for each deleted segment . The total capacity of deletion buffers is . Since we rebuild the structure after deletions, there are at most segments stored in deletion buffers, i.e., deletion buffers never get totally full and total space used is
Queries.
Let be the query point and be the the vertical ray emanating from in the direction. Let also be the slab containing . We can find in I/Os by storing all slab boundaries in a block. We perform a root-to-leaf search and we keep the first segment hit by among segments seen so far. While visiting a node we do the following: (i) perform a vertical ray-shooting query from among segments stored in buffers and , and update if necessary (ii) move to the child which contains the successor segment of in list (see Figure 3) and (iii) find in (resp. ) the segments crossing slab and should be stored (according to the total order) in and move them to or (resp. delete them from or store it in ). If a segment inserted in is also stored in , we delete it from both buffers. Once we reach a leaf , we first delete from the segments that are in the deletion buffer of its parent and then we perform ray-shooting query among the segments stored in and update if necessary.
Bounding the query cost: Since we follow a root-to-leaf path, and at each level we need to perform I/Os, a ray-shooting query is answered in I/Os.
5 Counting the Restructuring Cost
In Section 2 we proved the Theorem 1.1 (query and update bounds of the overall structure) without taking into account the cost of restructuring the interval tree due to insertions that cause leaves to become full. In this section we show that Theorem 1.1 holds while taking into account the restructuring of as well.
When a leaf becomes full we need to split it. This split in turn might cause the split of the parent and possibly continue up the tree, thus causing some part of the tree to need rebalancing. While rebalancing, we need to perform updates in the secondary structures so that they are adjusted with the updated nodes of the interval tree . In this section, we show that we can slightly modify our data structure such that all updates in secondary structures can be performed in amortized I/Os. This implies that Theorem 1.1 holds.
Our Approach.
We use a variant of the weight-balanced -tree of [9]. Each leaf stores at most segment endpoints. Let be a node at height with parent . Node stores elements in its subtree . We will show that if node splits, then we can perform all updates needed in the secondary structures in I/Os. This implies that a split requires amortized I/Os, since after a restructuring, there should be at least insertions in until the next split is needed. Since each insertion can cause splits, we get an amortized restructuring cost of I/Os for insertion.
Splitting a node.
Node splits into two new nodes and . The slab of is divided into two slabs with slab boundary ; see Figure 4. To capture this change and update our data structure, we need to perform updates in the secondary structures of and construct the secondary structures for . We describe these updates in detail and show that they can be performed in I/Os. In our analysis we use the fact that all secondary structures (multislab and left/right) storing segments can be scanned in I/Os.
Updates in secondary structures of .
We begin with the construction of left/right structures for and using the previous left/right structures for . We describe the creation of left structures and for and , respectively, and the right structures are symmetric. Segments that were stored in and do not cross (like segment in Figure 5) are stored in ; segments of that cross (see segment in Figure 5) are stored in . To identify if a segment is stored in or we just need to scan , which takes I/Os. Moreover, there are some additional segments that need to be stored in left/right structures of : the segments that are strictly inside the slab of (i.e. they were stored in secondary structures of ) and cross ; see e.g. segment in Figure 5. For those segments, their left subsegments are stored in and their right subsegments in . To find such segments we need to scan all secondary structures stored at . Since each secondary structure can be scanned in I/Os and there are structures stored in each node, all this takes I/Os.
We now proceed to the updates of the multislab structure of . Here, we just need to add some segments to the previous multislab structure. The new segments are the segments of that cross which are not already stored in the multislab (and symmetrically, the segments of that cross and are not yet in the multislab). For an example, see segment in Figure 5; before it was not stored in the multislab and now we store its middle subsegment. Note that the middle subsegment is a unit segment (i.e. crosses exactly one slab) thus we don’t need to compute a new total order; we can find its position in the total order by comparing it only with segments that cross slab . All those segments that need to be added can be found by scanning and in I/Os. Insertions in the multislab of require I/Os. Also, all information stored in nodes of the multislab structure can be updated in I/Os. Overall, all updates in the multislab structure of are performed in I/Os.
Construct secondary structures for and .
The left and right structures for each child slab of and will be based on the left/right structure of the same slab in just by removing the segments that cross (which are assigned to as we explained above). Similarly, segments that cross are excluded from the multislab structure.
We start with the construction of left/right structures of and . We describe the left and the right is symmetric. For each slab of , we scan the left list ; the segments that do not cross remain in and the others are deleted. All this takes I/Os.
Finally we create the multislab structures for and . Again, we need to scan the multislab of and delete the segments that cross , which takes I/Os. Then we need to build the multislabs of and out of the remaining segments. Since all segments are already sorted according to a total order, this can be done in I/Os.
6 Concluding Remarks
We presented the first data structure with sublogarithmic update time for dynamic planar point location in the DAM, matching the update bound achieved by -trees for the dictionary problem. Moreover, until the very recent work of Munro and Nerich [22] in SOCG’19, our query bound was the best known for the problem. Since in [22] authors achieved the first query bound, a very interesting research direction is to achieve the “best of both worlds”, i.e. describing a data structure with the query bound of [22] and the update time of the data structure presented in this work. We conjecture that the optimal bounds for dynamic planar point location in external memory are for queries and (the bound we achieved in this work) for updates.
Appendix A Queries in the Left and Right Structures.
In this Section we give further details on the left (right) structure which were omitted from Section 3.
Queries.
We begin with the queries and we show the correctness of the query algorithm of the static left (right) structure.
Correctness: The correctness of the query algorithm follows from the next lemma. For a node let be the set of segments stored in buffers in .
Lemma A.1**.**
Assume that at the end of the th step of the query algorithm, either or is defined. Then is the first segment hit by among the segments of .
Proof.
We prove the lemma by induction.
Induction Base: At the end of the first step, and are children of the root and is the first segment hit by among all segments stored at the root (in and ). By definition of , for any child of the root with higher -range than , is below all segments of . Similarly, for any child of the root with smaller -range than (if exists), there is no segment in hit by (since there exists a segment in hit by ). Finally, for any child of the root whose -range is between the range of and , by definition of , there is no segment in hit by . We conclude that is the first segment hit by among the segments in .
Inductive Step: Assume the lemma holds at the end of step , i.e. we have at least one of and at level and is the first segment hit by among all segments in .
During th step we ray-shoot on among segments stored in and , and update if necessary. Let be the node containing the first segment hit by among and (if such a segment exists). Let also be the node containing the first segment hit by among and (if such a segment exists).
By definition of , for any node which is a child of or with higher -range than , is below all segments of . Similarly, for a node which is a child of or with smaller -range than (if exists), there is no segment of hit by (since there exists a segment in hit by ). Finally, for any child of or whose -range is between the range of and , by definition of , there is no segment in hit by .
Recall that by the induction hypothesis at the end of the previous step was the first segment hit by among segments of . Now we updated and showed that there is no segment hit by before in any subtree other than or . We conclude that is the first segment hit by among the segments in . Since at the end of the th step we set and , the lemma follows.
∎
We now explain how Lemma A.1 implies the correctness of the query algorithm. To see that, let be the last level where either or is defined; at the beginning of the query algorithm at level , is the first segment hit by among segments of . Moreover at the end of this step, both and are not defined, i.e., for each child of or there is no segment in hit by before . Since , we get that is the first segment hit by among segments of . By checking all segments of and updating if necessary, we make sure that is the first segment hit by among segments of .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Pankaj K. Agarwal, Lars Arge, Gerth Stølting Brodal, and Jeffrey Scott Vitter. I/o-efficient dynamic point location in monotone planar subdivisions. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 11–20, 1999.
- 2[2] Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Commun. ACM , 31(9):1116–1127, 1988.
- 3[3] Lars Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica , 37(1):1–24, 2003.
- 4[4] Lars Arge, Gerth Stølting Brodal, and Loukas Georgiadis. Improved dynamic planar point location. In 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS) , pages 305–314, 2006.
- 5[5] Lars Arge, Gerth Stølting Brodal, and S. Srinivasa Rao. External memory planar point location with logarithmic updates. Algorithmica , 63(1-2):457–475, 2012.
- 6[6] Lars Arge, Vasilis Samoladas, and Jeffrey Scott Vitter. On two-dimensional indexability and optimal range search indexing. In PODS , pages 346–357. ACM Press, 1999.
- 7[7] Lars Arge and Jan Vahrenhold. I/o-efficient dynamic planar point location. Comput. Geom. , 29(2):147–162, 2004.
- 8[8] Lars Arge, Darren Erik Vengroff, and Jeffrey Scott Vitter. External-memory algorithms for processing line segments in geographic information systems. Algorithmica , 47(1):1–25, 2007.
