External Memory Planar Point Location with Fast Updates

John Iacono; Ben Karsin; Grigorios Koumoutsos

arXiv:1905.02620·cs.DS·March 31, 2022

External Memory Planar Point Location with Fast Updates

John Iacono, Ben Karsin, Grigorios Koumoutsos

PDF

TL;DR

This paper introduces a new external memory data structure for dynamic planar point location that significantly improves update times while maintaining efficient query performance, especially for large datasets.

Contribution

It presents a novel data structure achieving faster amortized update times in the external memory model for dynamic planar point location with constant face size.

Findings

01

Update time reduced by a factor of B^{1-ε}

02

Query time remains polylogarithmic in N

03

Supports efficient vertical ray-shooting queries

Abstract

We study dynamic planar point location in the External Memory Model or Disk Access Model (DAM). Previous work in this model achieves polylog query and polylog amortized update time. We present a data structure with $O (lo g_{B}^{2} N)$ query time and $O (\frac{1}{B ^{1 - ϵ}} lo g_{B} N)$ amortized update time, where $N$ is the number of segments, $B$ the block size and $ϵ$ is a small positive constant, under the assumption that all faces have constant size. This is a $B^{1 - ϵ}$ factor faster for updates than the fastest previous structure, and brings the cost of insertion and deletion down to subconstant amortized time for reasonable choices of $N$ and $B$ . Our structure solves the problem of vertical ray-shooting queries among a dynamic set of interior-disjoint line segments; this is well-known to solve dynamic planar point location for a connected subdivision of the plane…

Tables1

Table 1. Table 1: Overview of results on dynamic planar point location in external memory. Results marked with M are for monotone subdivisions and G for general ray-shooting among non-intersecting segments. Query bounds are worst-case and update bounds are amortized. Space usage is measured in words. Here ϵ italic-ϵ \epsilon is a constant such that 0 < ϵ ≤ 1 / 2 0 italic-ϵ 1 2 0<\epsilon\leq 1/2 .

Reference	Space	Query Time	Insertion Time	Deletion Time
Agarwal et al. [1]	$O (N)$	$O (\log_{B}^{2} N)$	$O (\log_{B}^{2} N)$	$O (\log_{B}^{2} N)$	M
Arge and Vahrenhold [7]	$O (N)$	$O (\log_{B}^{2} N)$	$O (\log_{B}^{2} N)$	$O (\log_{B} N)$	G
Arge et al. [5]	$O (N)$	$O (\log_{B}^{2} N)$	$O (\log_{B} N)$	$O (\log_{B} N)$	G
Munro and Nekrich [22]	$O (N)$	$O (\log_{B} N \log^{3} \log_{B} N)$	$O (\log_{B} N \log^{2} \log_{B} N)$	$O (\log_{B} N \log^{2} \log_{B} N)$	G
This paper	$O (N)$	$O (\log_{B}^{2} N)$	$O ((\log_{B} N) / B^{1 - ϵ})$	$O ((\log_{B} N) / B^{1 - ϵ})$	G

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\sidecaptionvpos

figuret

External Memory Planar Point Location with Fast Updates††thanks: This work was supported by the Fonds de la Recherche Scientifique-FNRS under Grant no MISU F 6001 1 and by NSF Grant CCF-1533564.

John Iacono Ben Karsin22footnotemark: 2 Grigorios Koumoutsos22footnotemark: 2

Université libre de Bruxelles, Belgium Université Libre de Bruxelles. {johniacono,bkarsin,gregkoumoutsos}@gmail.comNew York University, USA.

Abstract

We study dynamic planar point location in the External Memory Model or Disk Access Model (DAM). Previous work in this model achieves polylog query and polylog amortized update time. We present a data structure with $O(\log_{B}^{2}N)$ query time and $O(\frac{1}{B^{1-\epsilon}}\log_{B}N)$ amortized update time, where $N$ is the number of segments, $B$ the block size and $\epsilon$ is a small positive constant, under the assumption that all faces have constant size. This is a $B^{1-\epsilon}$ factor faster for updates than the fastest previous structure, and brings the cost of insertion and deletion down to subconstant amortized time for reasonable choices of $N$ and $B$ . Our structure solves the problem of vertical ray-shooting queries among a dynamic set of interior-disjoint line segments; this is well-known to solve dynamic planar point location for a connected subdivision of the plane with faces of constant size.

1 Introduction

The dynamic planar point location problem is one of the most fundamental and extensively studied problems in geometric data structures, and is defined as follows: We are given a connected planar polygonal subdivision $\Pi$ with $N$ edges. For any given query point $p$ , the goal is to find the face of $\Pi$ that contains $p$ , subject to insertions and deletions of edges. Here we focus on subdivisions $\Pi$ such that each face has constant number of edges. An equivalent formulation, which we use here is as follows: given a set $S$ of $N$ interior-disjoint line segments in the plane, for any given query point $p$ , report the first line segment in $S$ that a vertical upwards-facing ray from $p$ intersects, subject to insertions and deletions of segments.

Dynamic planar point location has many applications in spatial databases, geographic information systems (GIS), computer graphics, etc. Moreover it is a natural generalization of the dynamic dictionary problem with predecessor queries; this problem can be seen as the one dimensional variant of planar point location.

In this paper we focus on the External Memory model, also known as the Disk Access Model (DAM) [2]. The DAM is the standard method of designing algorithms that efficiently execute on large datasets stored in secondary storage. This model assumes a two-level memory hierarchy, called disk and internal memory and it is parameterized by values $M$ and $B$ ; the disk is partitioned into blocks of size $B$ , of which $M/B$ can be stored in memory at any given moment. The cost of an algorithm in the DAM is the number of block transfers between memory and disk, called Input-Output operations (I/Os). The quintessential DAM-model data structure is the B-Tree [11]. See [25, 26] for surveys. Many applications of dynamic planar point location, such as GIS problems, must efficiently process datasets that are too massive to fit in internal memory, thus it is of great relevance and interest to consider the problem in the DAM and to devise I/O efficient algorithms.

1.1 Previous Work

RAM Model.

In the RAM model (the leading model for applications where all data fit in the internal memory) the dynamic planar point location problem has been extensively studied [4, 10, 19, 18, 15, 21]. It is a major and long-standing open problem in computational geometry to design a data structure that supports queries and updates in $O(\log N)$ time [16, 17, 24], i.e., to achieve the same bounds as for the dynamic dictionary problem. In a recent breakthrough, Chan and Nekrich in FOCS’15 [15] presented a data structure supporting queries in $O(\log N(\log\log N)^{2})$ time and updates in $O(\log N(\log\log N))$ time. They also showed the tradeoff of supporting queries in $O(\log N)$ time and updates in $O((\log N)^{1+\epsilon})$ time or vice-versa for $\epsilon>0$ .

Recently Oh and Ahn [23] presented the first data structure for a more general setting where the polygonal subdivision $\Pi$ is not necessarily connected; their data structure supports queries in $O(\log N(\log\log N)^{2})$ time and updates in $O(\sqrt{N}\log N(\log\log N)^{3/2})$ amortized time.

External Memory model

(See Table 1). Several data structures have been presented over the years which support queries and updates in polylog( $N$ ) I/Os[1, 7, 5]. Table 1 contains a list of results of prior work. The best update bound known is by Arge, Brodal and Rao [5] and achieves $O(\log_{B}N)$ amortized I/Os. The query time of their data structure is $O(\log^{2}_{B}N)$ . Very recently, the first data structure that supports queries in $o(\log^{2}_{B}N)$ I/Os was announced by Munro and Nekrich [22]. In particular they support queries in $O(\log_{B}N(\log\log_{B}N)^{3})$ I/Os. However their update time is slightly worse than logarithmic, $O(\log_{B}N(\log\log_{B}N)^{2})$ . In all those works the bounds are obtained by solving the problem of vertical ray-shooting.

Fast Updates in External Memory.

One of the most intriguing and practically relevant features of the external memory model is that it allows fast updates. For the dynamic dictionary problem with predecessor queries, the optimal update bound in the RAM model is $O(\log N)$ . In external memory, however, $B$ -trees achieve the optimal query time of $O(\log_{B}N)$ and typical update time of $O(\log_{B}N)$ , although substantially faster update times are possible. Brodal and Fagerberg [14] showed that $O(\frac{1}{B^{1-\epsilon}}\log_{B}N)$ amortized I/Os per update can be supported, for small positive constant, $\epsilon$ , while retaining $O(\log_{B}N)$ -time queries; they further showed that this is an asymptotically optimal tradeoff between updates and queries. Observe that this update bound is a huge speedup from $O(\log_{B}N)$ and that for reasonable choices of parameters, e.g. $B\geq 1000$ , $N<10^{93}$ , $\epsilon=\frac{1}{2}$ , this yields a subconstant amortized number of I/Os per update. A similar update bound was later achieved for other dynamic problems like three-sided range reporting and top- $k$ queries [13].

Given this progress and the fact that in the RAM model the bounds achieved for planar point location and the dictionary problem are believed to coincide, it is natural to conjecture that a similar update bound can be achieved for the dynamic planar point location problem. However, to date no result has been presented that achieves sublogarithmic insertion or deletion time.

1.2 Our Results

We consider the dynamic planar point location problem in the external memory model and present the first data structure with sublogarithmic amortized update time of $O(\frac{1}{B^{1-\epsilon}}\log_{B}N)$ I/Os. Prior to our work, the best update bound for both insertions and deletions was $O(\log_{B}N)$ , achieved by Arge et al. [5]. Our main result is:

Theorem 1.1 (Main result).

For any constant $0<\epsilon\leq 1/2$ , there exists a data structure which uses $O(N)$ space, answers planar point location queries for polygonal subdivisions $\Pi$ with faces of constant size in $O((1/\epsilon)^{2}\cdot\log^{2}_{B}N)=O(\log^{2}_{B}N)$ I/Os and supports insertions and deletions in $O(\log_{B}N/(\epsilon\cdot B^{1-\epsilon}))=O((\log_{B}N)/B^{1-\epsilon})$ amortized I/Os. The data structure can be constructed in $O((N/B)\log_{B}N)$ I/Os.

To obtain this result, several techniques are used. Our primary data structure is an augmented interval tree [20]. We combine both the primary interval tree and two auxiliary structures described below with the buffering technique [14, 3] to improve insertion and deletion bounds. In Section 2 we prove Theorem 1.1 using our auxiliary structures as black boxes and omit some technical details relating to rebuilding; these details are deferred to Section 5.

Similarly to previous work, we focus on solving the problem of vertical ray-shooting queries. Our first auxiliary structure answers vertical ray-shooting queries among non-intersecting segments whose right (left) endpoints lie on the same vertical line. This is called the left (right) structure (in Section 2 it will be clear why we choose this terminology and not vice-versa). Left/Right structures of Agarwal et al. [1], which support queries and updates in $O(\log_{B}{K})$ I/Os, are used by several prior works [1, 7, 5]. Our structure improves on their result by reducing the update bound by a factor of $B^{1-\epsilon}$ . We obtain the following result, the proof of which is the topic of Section 3:

Theorem 1.2 (Left/right structure).

For a set of $K$ non-intersecting segments whose right (left) endpoints lie in the same vertical line and any constant $0<\epsilon\leq 1/2$ , we can create a data structure which supports vertical ray-shooting queries in $O((1/\epsilon)\cdot\log_{B}K)=O(\log_{B}K)$ I/Os and insertions and deletions in $O((\log_{B}K)/(\epsilon\cdot B^{1-\epsilon}))=O((\log_{B}K)/B^{1-\epsilon})$ amortized I/Os. This data structure uses $O(K)$ space and it can be constructed in $O((K/B)\log_{B}K)$ I/Os. If the segments are already sorted, it can be constructed in $O(K/B)$ I/Os.

Our second auxiliary structure answers vertical ray-shooting queries among non-intersecting segments whose endpoints lie in a set of $B^{\epsilon/2}+1$ vertical lines. These vertical lines define $B^{\epsilon/2}$ vertical slabs, hence the structure is called a multislab structure. We obtain the following result, the proof of which is the topic of Section 4:

Theorem 1.3 (Multislab structure).

For any constant $0<\epsilon\leq 1/2$ and set of $K$ non-intersecting segments whose endpoints lie in $B^{\epsilon/2}+1$ vertical lines, we can create a data structure which supports vertical ray-shooting queries in $O((1/\epsilon)\cdot\log_{B}K)=O(\log_{B}K)$ I/Os and insertions and deletions in $O((\log_{B}K)/(\epsilon\cdot B^{1-\epsilon}))=O((\log_{B}K)/B^{1-\epsilon})$ amortized I/Os. This data structure uses $O(K)$ space and it can be constructed in $O((K/B)\log_{B}K)$ I/Os. If the segments are already sorted according to a total order, it can be constructed in $O(K/B)$ I/Os.

A major challenge faced by previous multislab structures is how to efficiently support insertions. At a high-level, it is hard to deal with insertions in cases where a total order is maintained: each time a new segment gets inserted we need to determine its position in the total order, which cannot be done quickly. Arge and Vitter [7] developed a deletion-only multislab data structure and then used the so-called logarithmic method [12] which allowed them to handle insertions in $O(\log_{B}^{2}K)$ I/Os. Later Arge, Brodal and Rao [5] developed a more complicated multislab structure supporting insertions in amortized $O(\log_{B}K)$ I/Os by performing separate case analysis depending on the value of $B$ .

Here, we support insertions in a much simpler way by breaking each inserted segment into smaller unit segments whose endpoints lie on two consecutive vertical lines and can be compared easily to the segments already stored. This way, we are able to support insertions easily in $O(\log_{B}K)$ I/Os. Finally, we add buffering and obtain sublogarithmic update bounds.

1.3 Notation and Preliminaries

External Memory Model.

Throughout this paper we focus on the external memory model of computation. $N$ denotes the number of segments in the planar subdivision, $B$ the block size and $M$ the number of elements that fit in internal memory. We assume that $M\ll N$ and $2\leq B\leq\sqrt{M}$ (the tall cache assumption). It is well-known that sorting $K$ elements requires $\Theta((K/B)\log_{M/B}(K/B))$ I/Os [2]. Given that $B\leq\sqrt{M}$ , this bound is $O((K/B)\log_{B}K)$ . We use this bound for sorting in many places without further explanation.

Ray-shooting Queries.

In the rest of this paper, we focus on answering vertical ray-shooting queries in a dynamic set of non-intersecting line segments. Let $S$ be the set of segments of the polygonal subdivision $\Pi$ . Given a query point $p$ , the answer to a vertical ray-shooting query is the the first segment of $S$ hit by a vertical ray emanating from a query point in the $(+y)$ direction. Based on standard techniques (see e.g. [7]), for connected polygonal subdivisions $\Pi$ with faces of size $O(1)$ , a planar point location query for a point $p$ can be answered in $O(\log_{B}N)$ I/Os after answering a vertical ray-shooting query for $p$ .

$B^{\epsilon}$ -Trees.

All tree structures that we will use are variants of the $B^{\epsilon}$ -Trees [14] which are $B$ -trees except that the internal nodes have at most $B^{\epsilon}$ (and not $B$ ) children; the leaves still store $\Theta(B)$ data items. For constant $\epsilon$ , this does not change the asymptotic height of the tree or the search cost, both remain $O((1/\epsilon)\cdot\log_{B}N)=O(\log_{B}N)$ .

2 Overall Structure

In this Section we prove Theorem 1.1, using the data structures of Theorems 1.2 and 1.3 (detailed in Sections 3 and 4, respectively). Given $N$ non-intersecting segments in the plane and a constant $0<\epsilon\leq 1/2$ , we construct a $O(N)$ -space data structure which answers vertical ray-shooting queries in $O((1/\epsilon)^{2}\cdot\log^{2}_{B}N)=O(\log^{2}_{B}N)$ I/Os and supports updates in $O((\log_{B}N)/(\epsilon\cdot B^{1-\epsilon}))=O((\log_{B}N)/B^{1-\epsilon})$ amortized I/Os. Throughout this section we let $\epsilon^{\prime}=\epsilon/2$ .

The Data Structure.

As in the previous works on planar point location, our primary data structure is based on the interval tree (the external interval tree defined in [9]). Our interval tree $\mathcal{I}$ is a $B^{\epsilon^{\prime}}$ -tree which stores the $x$ -coordinates of segment endpoints in its leaves. Here we assume for clarity of presentation that the interval tree is static, i.e. all new segments inserted share $x$ -coordinates with already stored segments; in Section 5 we remove this assumption and extend our data structure to accommodate new $x$ -coordinates and achieve the bounds of Theorem 1.1.

Each node of $\mathcal{I}$ is associated with several secondary structures, as we explain later, and each segment is stored in the secondary structures of exactly one node of $\mathcal{I}$ . Each node $v$ of $\mathcal{I}$ is associated with a vertical slab $s_{v}$ . The slab of the root is the whole plane. For an internal node $v$ , the slab $s_{v}$ is divided into $B^{\epsilon^{\prime}}$ vertical slabs $s_{1},\dotsc,s_{B^{\epsilon^{\prime}}}$ corresponding to the children of $v$ , separated by vertical lines called slab boundaries, such that each slab $s_{i}$ contains the same number of vertices of $\Pi$ from slab $s_{v}$ .

Let $S$ be the set of segments that compose $\Pi$ . Each segment $\sigma\in S$ is assigned to a node $v$ of $\mathcal{I}$ . This is the highest node $v$ of $\mathcal{I}$ such that $\sigma$ is completely contained in slab $s_{v}$ and intersects at least one slab boundary partitioning $s_{v}$ ; if such an internal node $v$ does not exist, then $\sigma$ is assigned to a leaf $v$ such that $\sigma$ is completely contained in its slab $s_{v}$ . Segments assigned to internal nodes are stored in the secondary structures of those nodes, whereas segments assigned to leaves are stored explicitly in the corresponding leaf. By construction of the slab boundaries, each leaf stores $O(B)$ segments in $O(1)$ blocks.

Consider a segment $\sigma$ assigned to a node $v$ of $\mathcal{I}$ . Let $s_{\ell}$ and $s_{r}$ be the children slabs of $s_{v}$ where the left and right endpoints of $\sigma$ lie. We call the segment $\sigma\cap s_{\ell}$ the left subsegment of $\sigma$ , the segment $\sigma\cap s_{r}$ the right subsegment of $\sigma$ and the rest of $\sigma$ (which spans children slabs $s_{\ell+1},\dotsc,s_{r-1}$ ) is its middle subsegment. See Figure 1 for an illustration. In this example, the left subsegment is $\sigma\cap s_{5}$ , the right subsegment is $\sigma\cap s_{2}$ , and the portion of $\sigma$ in $s_{3}$ and $s_{4}$ is the middle subsegment.

Let $S_{v}$ be the set of segments assigned to a node $v$ of $\mathcal{I}$ . To store segments of $S_{v}$ , node $v$ of $\mathcal{I}$ contains the following secondary structures:

A multislab structure $\mathcal{M}$ which stores the set of middle segments. 2. 2.

$B^{\epsilon^{\prime}}$ left structures $L_{i}$ , for $1\leq i\leq B^{\epsilon^{\prime}}$ , storing the left (sub)segments of slab $s_{i}$ . 3. 3.

$B^{\epsilon^{\prime}}$ right structures $R_{i}$ , for $1\leq i\leq B^{\epsilon^{\prime}}$ , storing the right (sub)segments of slab $s_{i}$ .

In addition, each internal node $v$ contains an insertion buffer $I_{v}$ and deletion buffer $D_{v}$ , each storing up to $B$ segments.

Construction and Space Usage.

For every node $v$ , the buffers $I_{v}$ and $D_{v}$ fit in $O(1)$ blocks, since they store at most $B$ segments. By Theorems 1.2 and 1.3, a secondary structure storing $K$ segments uses $O(K)$ space. Since each segment of $S_{v}$ is stored in at most 3 secondary structures, overall secondary structures of $v$ use $O(|S_{v}|)$ space. Thus each node $v$ uses $O(|S_{v}|)$ space. We get that our data structure uses overall $O(\sum_{v\in\mathcal{I}}|S_{v}|)=O(N)$ space. The interval tree can be constructed in $O((N/B)\log_{B}N)$ I/Os. This can be done by sorting the segments by their endpoints’ $x$ -coordinates and then determining all slab boundaries to create a balanced interval tree. By Theorems 1.2 and 1.3, all secondary structures of a node $v$ of $\mathcal{I}$ can be constructed in $O((|S_{v}|/B)\log_{B}|S_{v}|)$ I/Os . Thus, all secondary structures of the tree can be constructed in $O((\sum_{v\in\mathcal{I}}|S_{v}|/B)\cdot\log_{B}N)=O((N/B)\log_{B}N)$ I/Os.

Queries.

To answer a vertical ray-shooting query for a point $p$ , we traverse the root-to-leaf path of $\mathcal{I}$ based on the $x$ -coordinate of $p$ , while maintaining a segment $\sigma$ (initialized to null) which is the answer to the query among segments assigned to nodes we have traversed so far. At each node $v$ visited along this path, we first update buffers $I_{v}$ and $D_{v}$ by removing from both of them all segments (if any) of $I_{v}\cap D_{v}$ . Then, we perform a vertical ray-shooting on the secondary structures of $v$ ; in particular we ray-shoot on the multislab structure and the left and right structures $L_{i}$ and $R_{i}$ , for $i$ such that the query point $p$ is in slab $s_{i}$ 111Minor detail: For each secondary structure considered, we first perform insertions/deletions of the corresponding segments from buffers $I_{v}$ and $D_{v}$ .. After checking the secondary structures, we update $\sigma$ if a closer segment above $p$ is found as a result. Next, we ray-shoot among segments stored in $I_{v}$ and update $\sigma$ if necessary. Finally, we determine which child $v_{i}$ of $v$ to visit, and flush any segments of $D_{v}$ that are contained in the slab of $v_{i}$ to $D_{v_{i}}$ ; this way we make sure that information about deleted segments is updated throughout the root-to-leaf path and no deleted segment can be considered as an answer to the query. We then continue the process at $v_{i}$ . Once a leaf node is reached, we simply compare the $B$ segments it contains with $p$ and return the closest segment above $p$ among them and $\sigma$ .

Bounding the query cost: Since any root-to-leaf path of $\mathcal{I}$ has length $O((1/\epsilon^{\prime})\cdot\log_{B}N)$ , each secondary data structure supports ray-shooting queries in $O((1/\epsilon^{\prime})\cdot\log_{B}N)$ I/Os (due to Theorems 1.2 and 1.3) and we check $O(1)$ secondary structures per node, we get that a query is answered in $O((1/\epsilon^{\prime})^{2}\cdot\log^{2}_{B}N)=O(\log^{2}_{B}N)$ I/Os. Note that in each node $v$ of the root-to-leaf path visited, the operations involving $I_{v}$ and $D_{v}$ require $O(1)$ I/Os, thus they increase the total cost by at most a $O(1)$ factor.

Insertions.

To handle insertions, we use the insertion buffers stored in nodes of $\mathcal{I}$ . When a new segment $\sigma$ is inserted, we insert it in the insertion buffer of the root. Let $v$ be an internal node with children $v_{1},\dotsc,v_{B^{\epsilon^{\prime}}}$ . Whenever $I_{v}$ becomes full, it is flushed. Segments of $I_{v}$ that cross at least one slab boundary partitioning $s_{v}$ are inserted in the secondary structures of $v$ ; segments that are contained in the slab $s_{i}$ of $v_{i}$ are inserted in $I_{v_{i}}$ , for $1\leq i\leq B^{\epsilon^{\prime}}$ . In case $I_{v}$ becomes full for some node $v$ whose children are leaves, we insert those segments explicitly at the corresponding leaves. When a leaf becomes full, we restructure the tree using split operations on full nodes.

Bounding the insertion cost: We compute the amortized cost of an insertion by considering three components:

(i)

The cost for moving segments between insertion buffers. Whenever an insertion buffer $I_{v}$ gets full, it forwards segments to the buffers of its $B^{\epsilon^{\prime}}$ children performing $O(B^{\epsilon^{\prime}})$ I/Os. Since a flushing occurs every $B$ insertions in $I_{v}$ , the amortized cost of such operations is $O(B^{\epsilon^{\prime}}/B)=O(1/(B^{1-\epsilon^{\prime}}))$ . Each segment will move in at most $O((1/\epsilon^{\prime})\log_{B}N)$ insertion buffers before it is inserted in the secondary structures of a node (or in a leaf). Thus the amortized cost for moving between buffers is $O((\log_{B}N)/(\epsilon^{\prime}\cdot B^{1-\epsilon^{\prime}}))$ . 2. (ii)

The insertion cost in the secondary structures. By Theorems 1.2 and 1.3 we get that insertions in secondary structures require $O((\log_{B}N)/(\epsilon\cdot B^{1-2\epsilon^{\prime}}))$ I/Os. 3. (iii)

The cost of restructuring the tree after insertions when a leaf becomes full. We show in Section 5 that the restructuring requires $O\big{(}\frac{\log_{B}N}{\epsilon^{\prime}\cdot B^{1-\epsilon^{\prime}}}\big{)}$ amortized I/Os, by slightly modifying our primary interval tree data structure.

We conclude that our data structure supports insertions in amortized $O(\log_{B}N/(\epsilon^{\prime}\cdot B^{1-2\epsilon^{\prime}}))=O(\log_{B}N/B^{1-\epsilon})$ I/Os.

Deletions.

To support deletions, we use the deletion buffers stored in all nodes of $\mathcal{I}$ . To delete a segment $\sigma$ , we first check whether $\sigma$ is in the insertion buffer $I_{r}$ of the root $r$ and in that case we delete it; otherwise we store it in $D_{r}$ . Similar to insertions, whenever $D_{v}$ gets full for some internal node $v$ with children $v_{1},\dotsc,v_{B^{\epsilon^{\prime}}}$ , we flush $D_{v}$ . The segments of $D_{v}$ crossing at least one slab boundary partitioning $s_{v}$ are deleted from the corresponding secondary structures associated with $v$ ; the other segments of $D_{v}$ are moved to buffers $D_{v_{i}}$ ; in case a segment $\sigma$ inserted in $D_{v_{i}}\cap I_{v_{i}}$ , we delete it from both buffers. In case $D_{v}$ becomes full for some $v$ parent of leaves, we delete those segments explicitly from the corresponding leaves.

Bounding the deletion cost: The deletion cost has three components:

(i)

Moving segments between the deletion buffers. Using the same argument as for insertions, we get that this requires $O(\log_{B}N/(\epsilon^{\prime}\cdot B^{1-\epsilon^{\prime}}))$ I/Os, amortized. 2. (ii)

The cost of deletion in the secondary structures. By Theorems 1.2 and 1.3 we get that deletions in secondary structures require amortized $O(\log_{B}N/(\epsilon^{\prime}\cdot B^{1-2\epsilon^{\prime}}))$ I/Os. 3. (iii)

The cost of restructuring the tree. Every $N/2$ deletions, we rebuild the structure using $O((N/B)\log_{B}N)$ I/Os, to get and amortized restructuring cost of $O((\log_{B}N)/B)$ I/Os.

Overall deletions are supported in amortized $O(\log_{B}N/(\epsilon^{\prime}\cdot B^{1-2\epsilon^{\prime}}))=O(\log_{B}N/(B^{1-\epsilon}))$ I/Os.

3 Left and Right Structures

In this section we prove Theorem 1.2. Given $K$ points all of whose right (left) endpoints lie on a single vertical line, we construct a data structure which answers vertical ray-shooting queries on those segments in $O(\log_{B}K)$ I/Os and supports insertions and deletions in $O((\log_{B}K)/B^{1-\epsilon})$ amortized I/Os for a constant $0<\epsilon\leq 1/2$ .

We describe the structure for the case where we are given a set $\mathcal{L}$ of $K$ segments whose right endpoints have the same $x$ -coordinate (left structure)222Recall from Section 2 that we call left structures the ones storing the left subsegment of a segment $\sigma$ , thus all subsegments stored in a left structure have the same $x$ -coordinate of right endpoints.. The case where the left endpoints of the segments have the same $x$ -coordinate (right structure) is completely symmetric. For a segment $\sigma$ , we will refer to the $y$ -coordinate of its right endpoint as the $y$ -coordinate of $\sigma$ . Conversely we define the $x$ -coordinate of $\sigma$ to be the $x$ -coordinate of its left endpoint.

Total Order.

We assume that the segments in $\mathcal{L}$ are ordered according to their $y$ -coordinates. We can always order the segments according to this total order in $O((K/B)\log_{B}K)$ I/Os.

The Data Structure.

We store all segments of $\mathcal{L}$ in an augmented $B^{\epsilon}$ -tree $\mathcal{T}$ which supports vertical ray-shooting queries, insertions and deletions. The degree of each node is between $B^{\epsilon}/2$ and $B^{\epsilon}$ , except the root which might have degree in the range $[2,B^{\epsilon}]$ , and leaves store $\Theta(B)$ elements. For a node $v\in\mathcal{T}$ , let $\mathcal{T}_{v}$ be the subtree rooted at $v$ . Since the segments are sorted according to their $y$ -coordinates, each subtree $\mathcal{T}_{v}$ corresponds to a range of $y$ -coordinates, which we call the $y$ -range of node $v$ . Let $v$ be an internal node of $\mathcal{T}$ with children $v_{1},\dotsc,v_{B^{\epsilon}}$ . Node $v$ stores the following information:

A buffer of segments $\mathcal{S}_{v}$ of capacity $B$ which contains segments in the $y$ -range of $v$ whose left endpoints have the smallest $x$ -coordinates (i.e., segments that extend the farthest from the vertical line) and are not stored in any buffer $\mathcal{S}_{w}$ for an ancestor $w$ of $v$ . In other words, $\mathcal{T}$ together with segments of buffers $\mathcal{S}_{v}$ form an external memory priority search tree [6]. 2. 2.

An insertion buffer $I_{v}$ and a deletion buffer $D_{v}$ , each storing up to $B$ segments. 3. 3.

A list $\mathcal{M}_{v}$ that contains, for each child $v_{i}$ , the segment with minimum $x$ -coordinate stored in $\mathcal{S}_{v_{i}}$ . We call this the minimal segment for child $v_{i}$ .

The data structure satisfies the following invariants: For each node $v\in\mathcal{T}$ , either $|\mathcal{S}_{v}|\geq B/2$ or if $|\mathcal{S}_{v}|<B/2$ , then $I_{v}$ and $D_{v}$ are empty and all buffers stored in descendants $v$ are empty. Also, for each node $v$ , buffers $\mathcal{S}_{v},I_{v}$ and $D_{v}$ are disjoint. Finally, for a leaf $v$ , $I_{v}$ and $D_{v}$ are empty.

Construction and Space Usage.

Overall buffers and lists of each node contain $O(B)$ segments, i.e. they can be stored in $O(1)$ blocks. Thus $\mathcal{T}$ can be stored in $O(K/B)$ blocks, i.e. it requires $O(K)$ space. Construction of $\mathcal{T}$ requires $O(\frac{K}{B}\log_{B}K)$ I/Os, since we need to sort all $K$ segments according to their $y$ -coordinates. If the segments are already sorted according to their $y$ -coordinate, then $\mathcal{T}$ can be created in $O(K/B)$ I/Os.

Queries in the static structure.

To get a feel for how our structure supports queries, we first show how to perform queries in the static case, i.e., assuming there are no insertions and deletions and all buffers $I_{v}$ and $D_{v}$ are empty. Later we will give a precise description of performing queries in the fully dynamic structure.

Let $\rho^{+}$ be the ray emanating from $p$ in the $(+y)$ direction and $\rho^{-}$ the ray emanating from $p$ in the $(-y)$ direction. We query the structure by finding the first segment hit by both $\rho^{+}$ and $\rho^{-}$ . We keep two pointers, $v_{+}$ and $v_{-}$ , initialized at the root. We also keep the closest segments $\sigma_{+}$ and $\sigma_{-}$ seen so far in the $(+y)$ and $(-y)$ direction respectively (initialized to $+\infty$ and $-\infty$ ). At each step, we update both $v_{+}$ and $v_{-}$ to move from a node of depth $i$ to a node of depth $i+1$ . While at level $i$ , $v_{-}$ and $v_{+}$ might coincide, or one of them might be undefined (set to null).

We now describe the query algorithm. We start at the root of $\mathcal{T}$ and advance down, while updating $v_{+}$ , $v_{-}$ , and $\sigma_{+}$ , $\sigma_{-}$ . When at depth $i$ , we find the first segment $\sigma_{i}$ hit by $\rho^{+}$ among $\mathcal{S}_{v_{-}}$ and $\mathcal{S}_{v_{+}}$ and update $\sigma_{+}$ if necessary (i.e. if $\sigma_{i}$ is the first segment hit by $\rho^{+}$ among all segments seen so far). Similarly, we ray-shoot on $\rho^{-}$ among $\mathcal{S}_{v_{-}}$ and $\mathcal{S}_{v_{+}}$ and update $\sigma_{-}$ if necessary. To determine in which nodes of depth $i+1$ to continue the search, we ray-shoot on $\rho^{+}$ among $\mathcal{M}_{v_{-}}$ and $\mathcal{M}_{v_{+}}$ and also ray-shoot on $\rho^{-}$ among $\mathcal{M}_{v_{-}}$ and $\mathcal{M}_{v_{+}}$ (i.e., all minimal segments of children of $v_{-}$ and $v_{+}$ ). Let $\sigma_{m+}$ be the first segment in $M_{v+}\cup M_{v-}$ hit by $\rho^{+}$ (if such a segment exists) and $v_{s}$ be the node containing $\sigma_{m+}$ (if $\sigma_{m+}$ exists). If the $y$ -range of $v_{s}$ is higher than the $y$ -coordinate of $\sigma_{+}$ or if $\sigma_{m+}$ does not exist, we leave $v_{+}$ undefined for level $i+1$ . Otherwise, we set $v_{+}=v_{s}$ . Similarly, call $\sigma_{m-}$ the first minimal segment of $M_{v+}\cup M_{v-}$ hit by $\rho^{-}$ and $v_{p}$ be the node containing $\sigma_{m-}$ (if such a segment exists). If the $y$ -range of $v_{p}$ is lower than the $y$ -coordinate of $\sigma_{-}$ or if $\sigma_{m-}$ does not exist, we leave $v_{-}$ undefined for level $i+1$ . Otherwise we set $v_{-}=v_{p}$ .

If both $v_{+}$ and $v_{-}$ are undefined for the next level $i+1$ , we stop the procedure and output $\sigma_{+}$ as the result to the vertical ray-shooting query. Otherwise we repeat the same procedure in the next level. When we reach a leaf level, we find the first segment hit by $\rho^{+}$ among $\mathcal{S}_{v_{-}}$ and $\mathcal{S}_{v_{+}}$ , update $\sigma+$ if necessary, and output $\sigma+$ as the result of the query.

Remark: The reader might wonder why we answer vertical ray-shooting queries in both directions and keep two pointers $v_{-}$ and $v_{+}$ . Isn’t it sufficient to answer queries in one direction and keep one pointer at each step? Figure 2 shows an example where this is not true and maintaining only the $v_{+}$ pointer would result in an incorrect answer.

The formal proof of correctness of this query algorithm is deferred to Appendix A.

Bounding the query cost: To count the cost, observe that in each step we move down the tree by one level and perform operations that require $O(1)$ I/Os, as we check $O(B)$ segments stored in the current nodes $v_{-}$ and $v_{+}$ . Since the height of the tree is $O((1/\epsilon)\log_{B}K)$ , a query is answered in $O((1/\epsilon)\log_{B}K))=O(\log_{B}K)$ I/Os.

Insertions.

Assume we want to insert a segment $\sigma$ into the left structure $\mathcal{L}$ . If the $x$ -value of $\sigma$ is smaller than the maximum $x$ -value of a segment stored in the buffer of the root $\mathcal{S}_{r}$ , we insert $\sigma$ into $\mathcal{S}_{r}$ . Otherwise we store $\sigma$ in the insertion buffer of the root $I_{r}$ . Note that insertion of $\sigma$ in $\mathcal{S}_{r}$ might cause $\mathcal{S}_{r}$ to overflow (i.e., $|\mathcal{S}_{r}|=B+1$ ); in that case we move the segment of $\mathcal{S}_{r}$ with the maximum $x$ -value into the insertion buffer of the root $I_{r}$ .

Let $v$ be an internal node with children $v_{1},\dotsc,v_{B^{\epsilon}}$ . Whenever the insertion buffer $I_{v}$ becomes full, we flush it, moving the segments to buffers of the corresponding children. For a segment $\sigma$ that should be stored in child $v_{i}$ , we repeat the same procedure as in the root: Check whether $\sigma$ has smaller $x$ -value than the maximum $x$ -value of a segment stored in $\mathcal{S}_{v_{i}}$ and if yes, store $\sigma$ in $\mathcal{S}_{v_{i}}$ , otherwise store it in $I_{v_{i}}$ . If $\mathcal{S}_{v_{i}}$ overflows, we move its last segment (i.e. the one with maximum $x$ -value) into $I_{v_{i}}$ . Also, if $\sigma$ gets stored in $\mathcal{S}_{v_{i}}$ and its $x$ -value is smaller than all previous segments of $\mathcal{S}_{v_{i}}$ , we update the minimal segment of $v_{i}$ , $\mathcal{M}_{v}$ .

When $\mathcal{S}_{v}$ overflows for some leaf $v$ , we split $v$ into two leaves $v_{1}$ and $v_{2}$ , as in standard $B$ -trees. Note that this might cause recursive splits of nodes at greater height.

Bounding the insertion cost: To flush a buffer $I_{v}$ and forward segments to buffers $\mathcal{S}_{v_{i}}$ and $I_{v_{i}}$ , for $1\leq i\leq B^{\epsilon}$ we perform $O(B^{\epsilon})$ I/Os. Since $I_{v}$ becomes full after at least $B$ insertions, the amortized cost of moving a segment from $I_{v}$ to buffers of a child of $v$ is $O(B^{\epsilon}/B)=O(1/B^{1-\epsilon})$ . Each inserted segment moves between buffers in a root-to-leaf path of length $O((1/\epsilon)\log_{B}K)$ , thus the total amortized cost for moves between buffers is $O(\log_{B}K/(\epsilon\cdot B^{1-\epsilon}))$ I/Os. The restructuring of $\mathcal{T}$ due to splitting nodes requires amortized $O(1/B)$ I/Os, as in standard B-trees. Thus, insertions are supported in $O(\log_{B}K/(\epsilon\cdot B^{1-\epsilon}))$ amortized I/Os.

Deletions.

To delete a segment $\sigma$ , we first check whether it is stored in the buffers of the root $\mathcal{S}_{r}$ or $I_{r}$ ; in this case we delete it. Otherwise, we insert $\sigma$ in the deletion buffer of the root $D_{r}$ .

Let $v$ be an internal node with children $v_{1},\dotsc,v_{B^{\epsilon}}$ . Whenever $D_{v}$ becomes full we flush it and move the segments to the corresponding children and repeat the same procedure: For a segment $\sigma$ which moves to child $v_{i}$ , we check whether it is stored in $\mathcal{S}_{v_{i}}$ or $I_{v_{i}}$ : if yes, we delete it and update the minimal segment of $v_{i}$ in $\mathcal{M}_{v}$ if necessary. Otherwise, we store $\sigma$ in the deletion buffer $D_{v_{i}}$ . If segment buffer $\mathcal{S}_{v}$ underflows (i.e., $|\mathcal{S}_{v}|<B/2$ ), we refill it using segments stored in buffers $\mathcal{S}_{v_{i}}$ ; the segments moved to $\mathcal{S}_{v}$ are deleted from $\mathcal{S}_{v_{i}}$ and all necessary updates in $\mathcal{M}_{v}$ are performed. This might cause underflowing segment buffers $\mathcal{S}_{v_{i}}$ for children of $v_{i}$ ; we handle those in the same way. In case all buffers $\mathcal{S}_{v_{i}}$ become empty and $|\mathcal{S}_{v}|<B$ , we move the segments from $I_{v}$ to $\mathcal{S}_{v}$ until either $|\mathcal{S}_{v}|=B$ or $|I_{v}|=0$ .

Bounding the deletion cost: Deletion cost consists of three components:

(i)

Cost for moving segments between buffers: Using the same analysis as for insertions we get that this requires $O(\log_{B}K/(\epsilon\cdot B^{1-\epsilon}))$ amortized I/Os. 2. (ii)

Cost due to refilling of buffers $\mathcal{S}_{v}$ : For a node $v$ with children $v_{i}$ , while refilling buffer $\mathcal{S}_{v}$ from $\mathcal{S}_{v_{i}}$ we perform $O(B^{\epsilon})$ I/Os and we move $\Theta(B)$ segments one level higher. Thus the amortized cost of moving a segment up by one level is $O(1/B^{1-\epsilon})$ . Since the tree has height $O((1/\epsilon)\cdot\log_{B}K)$ , over a sequence of $K$ deletions the total number of moves of segments by one level is $O((1/\epsilon)\cdot K\cdot\log_{B}K)$ . Thus the total cost due to refilling is at most $O((1/\epsilon B^{1-\epsilon})K\cdot\log_{B}K)$ , which implies that the amortized cost is $O(\log_{B}K/(\epsilon\cdot B^{1-\epsilon}))$ .

A corner case that we did not take into account above is when the total number of segments stored in buffers $\mathcal{S}_{v_{i}}$ are less than $B/2$ . In this case it is not valid that the amortized cost of updating $\mathcal{S}_{v}$ is $O(B^{\epsilon}/B)$ . To take care of this, we use a simple amortization trick: we double charge all I/Os performed relating to insertions. This way, for each buffer $\mathcal{S}_{v_{i}}$ there is a saved I/O from the time when segments move from $I_{v}$ to node $v_{i}$ . We use this additional saved I/O when $\mathcal{S}_{v_{i}}$ gets emptied due to the refilling of $S_{v}$ . 3. (iii)

Restructuring requires $O(\frac{\log_{B}K}{B})$ amortized I/Os, by rebuilding the structure after $K/2$ deletions.

Overall, the amortized deletion cost is $O(\log_{B}K/(\epsilon\cdot B^{1-\epsilon}))=O(\log_{B}K/B^{1-\epsilon})$ I/Os.

Queries in the dynamic structure.

We now describe how to extend our query algorithm to the dynamic case. In order to ensure that all nodes visited are up-to-date and we do not miss any updates in the insertion/deletion buffers, when moving a pointer from a node $u$ to its child $v_{i}$ , we flush any deletes in $D_{u}$ to $v_{i}$ , i.e. delete segments of $D_{u}$ that are stored in $\mathcal{S}_{v_{i}}$ , store the other segments in $D_{v_{i}}$ and update $\mathcal{M}_{u}$ if necessary. We then delete any segments found in both $I_{v_{i}}$ and $D_{v_{i}}$ . Finally, we compare segments in $I_{v_{i}}$ with $\sigma_{+}$ (recall this is the first segment hit by $\rho^{+}$ among segments considered so far) and, if any segment in $I_{v_{i}}$ would be hit by $\rho^{+}$ before $\sigma_{+}$ we replace $\sigma_{+}$ with it. Clearly this increases the total cost by at most a $O(1)$ factor compared to the static case, thus the query cost is $O((1/\epsilon)\log_{B}K)$ I/Os.

4 Multislab Structure

In this section we prove Theorem 1.3. Assume that we are given a set of $K$ non-intersecting segments with endpoints on at most $B^{\epsilon/2}+1$ vertical lines $l_{1},\dotsc,l_{B^{\epsilon/2}+1}$ , for some constant $O<\epsilon\leq 1/2$ . We show that those segments can be stored in a data structure which uses $O(K)$ space, supports vertical ray-shooting queries in $O(\log_{B}K)$ I/Os, and updates in $O(\log_{B}K/B^{1-\epsilon})$ amortized I/Os, for $0<\epsilon\leq 1/2$ . This data structure can be constructed in $O((K/B)\log_{B}K)$ I/Os. We call this data structure a multislab structure.

For notational convenience we set $\epsilon^{\prime}=\epsilon/2$ . This way endpoints of the segments lie on at most $B^{\epsilon^{\prime}}+1$ vertical lines $l_{1},\dotsc,l_{B^{\epsilon^{\prime}}+1}$ . For $1\leq i\leq B^{\epsilon^{\prime}}$ , let $s_{i}$ denote the vertical slab defined by vertical lines $l_{i}$ and $l_{i+1}$ . We will show that queries are supported in $O(\log_{B}K)$ I/Os and updates in $O((\log_{B}K)/B^{1-2\epsilon^{\prime}})$ I/Os. Theorem 1.3 then follows.

Total Order.

In order to implement the multislab structure we need to maintain an ordering of the segments based on their $y$ -coordinates. Using standard approaches (see e.g. [7, 5]) we can define a partial order for segments that can be intersected by a vertical line. Arge et. al. [8] showed how to extend a partial order into a total order on $K$ segments (not necessarily all intersecting the same vertical line) in $O((K/B)\log_{M/B}\frac{K}{B})=O((K/B)\log_{B}K)$ I/Os. We use this total order to create our multislab structure.

The Data Structure.

We store the ordered segments in an augmented B-tree $\mathcal{T}$ which supports queries, insertions and deletions. The degree of each node is between $B^{\epsilon^{\prime}}/2$ and $B^{\epsilon^{\prime}}$ , except the root which might have degree in the range $[2,B^{\epsilon^{\prime}}]$ . Leaves store $\Theta(B)$ elements. For a node $v\in T$ , let $\mathcal{T}_{v}$ be the subtree rooted at $v$ . Let $v_{1},\dotsc,v_{B^{\epsilon^{\prime}}}$ be the children of an internal node $v$ . Node $v$ stores the following information:

A buffer $\mathcal{S}_{v}$ of capacity $B$ which contains the highest (according to the total order) segments stored in $\mathcal{T}_{v}$ which are not stored in any buffer $\mathcal{S}_{w}$ for an ancestor $w$ of $v$ . In other words, $\mathcal{T}$ together with segments of buffers $\mathcal{S}_{v}$ form an external memory priority search tree [6]. 2. 2.

An insertion buffer $I_{v}$ and a deletion buffer $D_{v}$ , both storing up to $B$ segments. 3. 3.

A list $L_{v}$ which contains, for each slab $s_{i}$ , $1\leq i\leq B^{\epsilon^{\prime}}$ , and each child $v_{j}$ , $1\leq j\leq B^{\epsilon^{\prime}}$ , the highest segment (according to the total order) $t_{i,j}$ crossing slab $s_{i}$ stored in $\mathcal{T}_{v_{j}}$ .

The data structure satisfies the following invariants: i) for each node $v\in\mathcal{T}$ , either $|\mathcal{S}_{v}|\geq B/2$ or if $|\mathcal{S}_{v}|<B/2$ , then $I_{v}$ and $D_{v}$ are empty and all buffers of descendants $w$ of $v$ are empty, ii) for each node $v$ , buffers $\mathcal{S}_{v},I_{v}$ and $D_{v}$ are disjoint, and iii) for every leaf $v$ , $I_{v}$ and $D_{v}$ are empty.

Construction and Space Usage.

Overall buffers of each node contain $O(B)$ segments and list $L_{v}$ contains at most $B^{2\epsilon^{\prime}}=O(B)$ segments, i.e., they can be stored in $O(1)$ blocks. Thus $\mathcal{T}$ can be stored in $O(K/B)$ blocks, i.e. it requires $O(K)$ space. The structure can be constructed in $O(\frac{K}{B}\log_{B}K)$ I/Os. If segments are already sorted according to a total order, construction requires $O(K/B)$ I/Os.

Insertions.

To insert a new segment $\sigma$ we need to determine its position in the total order. Clearly, we can not afford to produce a new total order from scratch, as this costs $O((K/B)\log_{B}K)$ I/Os. Thus, we break $\sigma$ into at most $B^{\epsilon^{\prime}}$ unit segments, where each segment crosses exactly one slab. In particular, if $\sigma$ crosses slabs $s_{\ell},\dotsc,s_{r}$ , we break it into unit segments $\sigma_{\ell},\dotsc,\sigma_{r}$ , where segment $\sigma_{i}$ crosses slab $s_{i}$ . We call all such unit segments stored in $\mathcal{T}$ new segments. The rest of the segments stored in $\mathcal{T}$ are called the old segments of $\mathcal{T}$ . Now we can easily update the total order: segment $\sigma_{i}$ needs to be compared only with segments crossing slab $s_{i}$ ; if $\sigma_{p}$ and $\sigma_{s}$ are the predecessor and successor of $\sigma_{i}$ within slab $s_{i}$ , we locate $\sigma_{i}$ in an arbitrary position between $\sigma_{p}$ and $\sigma_{s}$ in the total order. This way a valid total order is always maintained.

We now describe the insertion algorithm. When segment $\sigma$ needs to be inserted, we first break it into unit segments $\sigma_{\ell},\dotsc,\sigma_{r}$ . For each segment $\sigma_{j}$ , $\ell\leq j\leq r$ , we first check whether it should be inserted in the buffer $\mathcal{S}_{r}$ of the root: if this is the case we store it there; otherwise we store it in the insertion buffer of the root $I_{r}$ . In case $\mathcal{S}_{r}$ overflows (i.e. $|\mathcal{S}_{r}|=B+1$ ) we move its last segment (according to the total order) to $I_{r}$ . Let $v$ be an internal node with children $v_{1},\dotsc,v_{B^{\epsilon^{\prime}}}$ . Each time $I_{v}$ becomes full, we flush it and move the segments to its children $v_{i}$ , for $1\leq i\leq B^{\epsilon^{\prime}}$ . For a segment moving from $v$ to $v_{i}$ , we first check whether it is greater (according to the total order) than the minimum segment stored in $\mathcal{S}_{v_{i}}$ and if so we store it in $\mathcal{S}_{v_{i}}$ ; otherwise we store it in buffer $I_{v_{i}}$ . In case $\mathcal{S}_{v_{i}}$ overflows (i.e. $|\mathcal{S}_{v_{i}}|=B+1$ ) we move its last segment to $I_{v_{i}}$ . Also we update information in list $L_{v}$ if necessary. In case $I_{v_{i}}$ becomes full, we repeat the same procedure recursively.

When $\mathcal{S}_{v}$ overflows for some leaf $v$ , we split $v$ into two leaves $v_{1}$ and $v_{2}$ , as in standard $B$ -trees. Note that this might cause recursive splits of nodes at greater height.

Bounding the insertion cost: To flush a buffer $I_{v}$ and move segments to buffers of child nodes $\mathcal{S}_{v_{i}}$ and $I_{v_{i}}$ , we need to perform $O(B^{\epsilon^{\prime}})$ I/Os. Since each segment breaks into at most $B^{\epsilon^{\prime}}$ unit segments, a buffer of size $B$ becomes full after at least $B/B^{\epsilon^{\prime}}=B^{1-\epsilon^{\prime}}$ insertions. Thus the amortized cost of moving a segment from a buffer of depth $i$ to depth $i+1$ is $O(B^{\epsilon^{\prime}}/{B^{1-\epsilon^{\prime}}})=O(1/B^{1-2\epsilon^{\prime}})$ . Since each segment will be eventually stored in a node of depth $O((1/\epsilon^{\prime})\cdot\log_{B}K)$ , the amortized cost until it gets inserted is $O(\log_{B}K/(\epsilon^{\prime}\cdot B^{1-2\epsilon^{\prime}}))$ . The restructuring of $\mathcal{T}$ due to splitting full nodes requires amortized $O(1)$ I/Os, as in standard B-trees. Overall insertions require $O(\log_{B}K/(\epsilon\cdot B^{1-2\epsilon^{\prime}}))=O((\log_{B}K)/B^{1-\epsilon})$ amortized I/Os.

Linear space usage: To avoid increases in space usage due to unit segments, whenever there are $K/B^{\epsilon^{\prime}}$ new segments, we rebuild the structure. This way the space used is $O(K+(K/B^{\epsilon^{\prime}})\cdot B^{\epsilon^{\prime}})=O(K)$ . This rebuilding requires $O((K/B)\log_{B}K)$ I/Os, i.e., $O(\log_{B}K/B^{1-\epsilon^{\prime}})$ amortized I/Os, thus it does not violate the insertion time bound.

Deletions.

The process of deleting a segment, $\sigma$ , is similar to insertion: we break $\sigma$ into at most $B^{\epsilon^{\prime}}$ unit segments $\sigma_{\ell},\dotsc,\sigma_{r}$ where $s_{\ell}$ and $s_{r}$ are the leftmost and rightmost slabs spanned by $\sigma$ and apply the deletion procedure for each of those unit segments separately.

The deletion algorithm for a unit segment $\sigma_{i}$ is analogous to the one of the left (right) structure of Section 3. For completeness we describe it here. To delete a unit segment $\sigma_{i}$ , we first check whether it is stored in the buffers of the root $\mathcal{S}_{r}$ or $I_{r}$ ; in this case we delete it. Otherwise, we insert $\sigma_{i}$ in the deletion buffer of the root $D_{r}$ . Let $v$ be an internal node with children $v_{1},\dotsc,v_{B^{\epsilon^{\prime}}}$ . Whenever $D_{v}$ becomes full we flush it and forward the segments to the corresponding children and repeat the same procedure: For a segment $\sigma$ which moves to child $v_{i}$ , we check whether it is stored in $\mathcal{S}_{v_{i}}$ or $I_{v_{i}}$ and if this is the case, we delete it and update list $L_{v}$ if necessary. Otherwise, we store $\sigma_{i}$ in the deletion buffer $D_{v_{i}}$ .

In case segment buffer $\mathcal{S}_{v}$ underflows (i.e., $|\mathcal{S}_{v}|<B/2$ ), we refill it using segments from buffers $\mathcal{S}_{v_{i}}$ ; segments moved to $\mathcal{S}_{v}$ are deleted from $\mathcal{S}_{v_{i}}$ and $L_{v}$ gets updated (if needed). This might cause underflowing segment buffers $\mathcal{S}_{v_{i}}$ ; we handle those in the same way. In case all buffers $\mathcal{S}_{v_{i}}$ become empty and $|\mathcal{S}_{v}|<B$ , we move to $\mathcal{S}_{v}$ the segments from $I_{v}$ until either $|\mathcal{S}_{v}|=B$ or $|I_{v}|=0$ . After $K/B^{\epsilon^{\prime}}$ deletions we rebuild our data structure.

Remark: Note that here we split all segments $\sigma$ into unit segments $\sigma_{\ell},\dotsc,\sigma_{r}$ . However, the old segments $\sigma$ are not unit segments and are stored manually in the data structure. However this does not affect our algorithm: whenever the first unit segment $\sigma_{i}$ which is a part of $\sigma$ reaches the node $v$ such that $\sigma\in\mathcal{S}_{v}$ , we delete $\sigma$ from $\mathcal{S}_{v}$ and remove $\sigma_{i}$ from deletion buffers. The remaining segments $\sigma_{j}$ will eventually reach node $v$ and realize that $\sigma$ is already deleted from $\mathcal{S}_{v}$ ; at this point $\sigma_{j}$ gets deleted.

Bounding the deletion cost: The analysis of the deletion cost is identical to the analysis of deletions in the structure of Section 3. Since each segment breaks into at most $B^{\epsilon^{\prime}}$ unit segments, we get an amortized deletion cost of $O(\log_{B}K/B^{1-2\epsilon^{\prime}})=O(\log_{B}K/B^{1-\epsilon})$ .

Linear space usage: Similar to insertions, we need to make sure that the total space used is not increasing asymptotically due to the use of at most $B^{\epsilon^{\prime}}$ unit segments in deletion buffers for each deleted segment $\sigma$ . The total capacity of deletion buffers is $O(K)$ . Since we rebuild the structure after $K/B^{\epsilon^{\prime}}$ deletions, there are at most $O(K)$ segments stored in deletion buffers, i.e., deletion buffers never get totally full and total space used is $O(K)$

Queries.

Let $p$ be the query point and $\rho^{+}$ be the the vertical ray emanating from $p$ in the $(+y)$ direction. Let also $s_{p}$ be the slab containing $p$ . We can find $s_{p}$ in $O(1)$ I/Os by storing all slab boundaries in a block. We perform a root-to-leaf search and we keep the first segment $\sigma$ hit by $\rho^{+}$ among segments seen so far. While visiting a node $v$ we do the following: (i) perform a vertical ray-shooting query from $p$ among segments stored in buffers $\mathcal{S}_{v}$ and $I_{v}$ , and update $\sigma$ if necessary (ii) move to the child $v_{i}$ which contains the successor segment $t_{p,j}$ of $p$ in list $L_{v}$ (see Figure 3) and (iii) find in $I_{v}$ (resp. $D_{v}$ ) the segments crossing slab $s_{p}$ and should be stored (according to the total order) in $\mathcal{T}_{v_{i}}$ and move them to $\mathcal{S}_{v_{i}}$ or $I_{v_{i}}$ (resp. delete them from $\mathcal{S}_{v_{i}}$ or store it in $D_{v_{i}}$ ). If a segment inserted in $D_{v_{i}}$ is also stored in $I_{v_{i}}$ , we delete it from both buffers. Once we reach a leaf $v$ , we first delete from $\mathcal{S}_{v}$ the segments that are in the deletion buffer of its parent and then we perform ray-shooting query among the segments stored in $\mathcal{S}_{v}$ and update $\sigma$ if necessary.

Bounding the query cost: Since we follow a root-to-leaf path, and at each level we need to perform $O(1)$ I/Os, a ray-shooting query is answered in $O((1/\epsilon^{\prime})\cdot\log_{B}K)$ I/Os.

5 Counting the Restructuring Cost

In Section 2 we proved the Theorem 1.1 (query and update bounds of the overall structure) without taking into account the cost of restructuring the interval tree $\mathcal{I}$ due to insertions that cause leaves to become full. In this section we show that Theorem 1.1 holds while taking into account the restructuring of $\mathcal{I}$ as well.

When a leaf becomes full we need to split it. This split in turn might cause the split of the parent and possibly continue up the tree, thus causing some part of the tree $\mathcal{I}$ to need rebalancing. While rebalancing, we need to perform updates in the secondary structures so that they are adjusted with the updated nodes of the interval tree $\mathcal{I}$ . In this section, we show that we can slightly modify our data structure such that all updates in secondary structures can be performed in $O(\frac{\log_{B}N}{B^{1-\epsilon}})$ amortized I/Os. This implies that Theorem 1.1 holds.

Our Approach.

We use a variant of the weight-balanced $B^{\epsilon}$ -tree of [9]. Each leaf stores at most $B$ segment endpoints. Let $v$ be a node at height $h-1$ with parent $p(v)$ . Node $p(v)$ stores $w_{v}=\Theta(B\cdot B^{\epsilon h})$ elements in its subtree $\mathcal{I}_{p(v)}$ . We will show that if node $v$ splits, then we can perform all updates needed in the secondary structures in $O(w_{v}/B^{1-\epsilon})$ I/Os. This implies that a split requires amortized $O(1/B^{1-\epsilon})$ I/Os, since after a restructuring, there should be at least $\Omega(w_{v})$ insertions in $\mathcal{I}_{p(v)}$ until the next split is needed. Since each insertion can cause $O(\log_{B}N)$ splits, we get an amortized restructuring cost of $O(\frac{\log_{B}N}{B^{1-\epsilon}})$ I/Os for insertion.

Splitting a node.

Node $v$ splits into two new nodes $v_{1}$ and $v_{2}$ . The slab $s_{v}$ of $v$ is divided into two slabs $s_{v_{1}},s_{v_{2}}$ with slab boundary $b$ ; see Figure 4. To capture this change and update our data structure, we need to perform updates in the secondary structures of $p(v)$ and construct the secondary structures for $v_{1},v_{2}$ . We describe these updates in detail and show that they can be performed in $O(w_{v}/B^{1-\epsilon})$ I/Os. In our analysis we use the fact that all secondary structures (multislab and left/right) storing $K$ segments can be scanned in $O(K/B)$ I/Os.

Updates in secondary structures of $p(v)$ .

We begin with the construction of left/right structures for $v_{1}$ and $v_{2}$ using the previous left/right structures for $v$ . We describe the creation of left structures $L_{v_{1}}$ and $L_{v_{2}}$ for $v_{1}$ and $v_{2}$ , respectively, and the right structures are symmetric. Segments that were stored in $L_{v}$ and do not cross $b$ (like segment $\sigma_{1}$ in Figure 5) are stored in $L_{v_{2}}$ ; segments of $L_{v}$ that cross $b$ (see segment $\sigma_{2}$ in Figure 5) are stored in $L_{v_{1}}$ . To identify if a segment is stored in $L_{v_{1}}$ or $L_{v_{2}}$ we just need to scan $L_{v}$ , which takes $O(w_{v}/B)$ I/Os. Moreover, there are some additional segments that need to be stored in left/right structures of $p(v)$ : the segments that are strictly inside the slab of $v$ (i.e. they were stored in secondary structures of $v$ ) and cross $b$ ; see e.g. segment $\sigma_{3}$ in Figure 5. For those segments, their left subsegments are stored in $L_{v_{1}}$ and their right subsegments in $R_{v_{2}}$ . To find such segments we need to scan all secondary structures stored at $v$ . Since each secondary structure can be scanned in $O(w_{v}/B)$ I/Os and there are $O(B^{\epsilon})$ structures stored in each node, all this takes $O((w_{v}/B)\cdot B^{\epsilon})=O(w_{v}/B^{1-\epsilon})$ I/Os.

We now proceed to the updates of the multislab structure of $p(v)$ . Here, we just need to add some segments to the previous multislab structure. The new segments are the segments of $L_{v}$ that cross $b$ which are not already stored in the multislab (and symmetrically, the segments of $R_{v}$ that cross $b$ and are not yet in the multislab). For an example, see segment $\sigma_{2}$ in Figure 5; before it was not stored in the multislab and now we store its middle subsegment. Note that the middle subsegment is a unit segment (i.e. crosses exactly one slab) thus we don’t need to compute a new total order; we can find its position in the total order by comparing it only with segments that cross slab $s_{v_{2}}$ . All those segments that need to be added can be found by scanning $L_{v}$ and $R_{v}$ in $O(w_{v}/B)$ I/Os. Insertions in the multislab of $p(v)$ require $O((\log_{B}w_{v})/B^{1-\epsilon})=O(w_{v}/B)$ I/Os. Also, all information stored in nodes of the multislab structure can be updated in $O(w_{v}/B)$ I/Os. Overall, all updates in the multislab structure of $v$ are performed in $O(w_{v}/B)$ I/Os.

Construct secondary structures for $v_{1}$ and $v_{2}$ .

The left and right structures for each child slab of $v_{1}$ and $v_{2}$ will be based on the left/right structure of the same slab in $v$ just by removing the segments that cross $b$ (which are assigned to $p(v)$ as we explained above). Similarly, segments that cross $b$ are excluded from the multislab structure.

We start with the construction of left/right structures of $v_{1}$ and $v_{2}$ . We describe the left and the right is symmetric. For each slab $s_{k}$ of $v$ , $1\leq k\leq B^{\epsilon}$ we scan the left list $L_{k}$ ; the segments that do not cross $b$ remain in $L_{k}$ and the others are deleted. All this takes $O((w_{v}/B)\cdot B^{\epsilon})=O(w_{v}/B^{1-\epsilon})$ I/Os.

Finally we create the multislab structures for $v_{1}$ and $v_{2}$ . Again, we need to scan the multislab of $v$ and delete the segments that cross $b$ , which takes $O(w_{v}/B)$ I/Os. Then we need to build the multislabs of $v_{1}$ and $v_{2}$ out of the remaining segments. Since all segments are already sorted according to a total order, this can be done in $O(w_{v}/B)$ I/Os.

6 Concluding Remarks

We presented the first data structure with sublogarithmic update time for dynamic planar point location in the DAM, matching the update bound achieved by $B^{\epsilon}$ -trees for the dictionary problem. Moreover, until the very recent work of Munro and Nerich [22] in SOCG’19, our query bound $O(\log^{2}_{B}N)$ was the best known for the problem. Since in [22] authors achieved the first $o(\log^{2}_{B}N)$ query bound, a very interesting research direction is to achieve the “best of both worlds”, i.e. describing a data structure with the query bound of [22] and the update time of the data structure presented in this work. We conjecture that the optimal bounds for dynamic planar point location in external memory are $O(\log_{B}N)$ for queries and $O(\log_{B}N/B^{1-\epsilon})$ (the bound we achieved in this work) for updates.

Appendix A Queries in the Left and Right Structures.

In this Section we give further details on the left (right) structure which were omitted from Section 3.

Queries.

We begin with the queries and we show the correctness of the query algorithm of the static left (right) structure.

Correctness: The correctness of the query algorithm follows from the next lemma. For a node $v\in\mathcal{T}$ let $S_{v}$ be the set of segments stored in buffers $\mathcal{S}$ in $\mathcal{T}_{v}$ .

Lemma A.1.

Assume that at the end of the $i$ th step of the query algorithm, either $v_{+}$ or $v_{-}$ is defined. Then $\sigma_{+}$ is the first segment hit by $\rho^{+}$ among the segments of $\mathcal{L}-(S_{v_{-}}\cup S_{v_{+}})$ .

Proof.

We prove the lemma by induction.

Induction Base: At the end of the first step, $v_{+}$ and $v_{-}$ are children of the root $r$ and $\sigma_{+}$ is the first segment hit by $\rho^{+}$ among all segments stored at the root (in $\mathcal{S}_{r}$ and $\mathcal{M}_{r}$ ). By definition of $v_{s}=v_{+}$ , for any child of the root $v$ with higher $y$ -range than $v_{+}$ , $\sigma_{+}$ is below all segments of $S_{v}$ . Similarly, for any child of the root $v^{\prime}$ with smaller $y$ -range than $v_{-}$ (if $v_{-}$ exists), there is no segment in $S_{v^{\prime}}$ hit by $\rho^{+}$ (since there exists a segment in $\mathcal{S}_{v_{-}}$ hit by $\rho^{-}$ ). Finally, for any child $v^{\prime\prime}$ of the root whose $y$ -range is between the range of $v_{-}$ and $v_{+}$ , by definition of $v+$ , there is no segment in $S_{v^{\prime\prime}}$ hit by $\rho^{+}$ . We conclude that $\sigma_{+}$ is the first segment hit by $\rho^{+}$ among the segments in $\mathcal{L}-(S_{v_{-}}\cup S_{v_{+}})$ .

Inductive Step: Assume the lemma holds at the end of step $i$ , i.e. we have at least one of $v_{+}$ and $v_{-}$ at level $i$ and $\sigma_{+}$ is the first segment hit by $\rho^{+}$ among all segments in $\mathcal{L}-(S_{v_{+}}\cup S_{v_{+}})$ .

During $(i+1)$ th step we ray-shoot on $\rho^{+}$ among segments stored in $\mathcal{S}_{v_{+}},\mathcal{S}_{v_{-}},\mathcal{M}_{v_{+}}$ and $\mathcal{M}_{v_{-}}$ , and update $\sigma_{+}$ if necessary. Let $v_{s}$ be the node containing the first segment hit by $\rho^{+}$ among $\mathcal{M}_{v_{+}}$ and $\mathcal{M}_{v_{-}}$ (if such a segment exists). Let also $v_{p}$ be the node containing the first segment hit by $\rho^{-}$ among $\mathcal{M}_{v_{+}}$ and $\mathcal{M}_{v_{-}}$ (if such a segment exists).

By definition of $v_{s}$ , for any node $v$ which is a child of $v_{-}$ or $v_{+}$ with higher $y$ -range than $v_{s}$ , $\sigma_{+}$ is below all segments of $S_{v}$ . Similarly, for a node $v^{\prime}$ which is a child of $v_{-}$ or $v_{+}$ with smaller $y$ -range than $v_{p}$ (if $v_{p}$ exists), there is no segment of $S_{v^{\prime}}$ hit by $\rho^{+}$ (since there exists a segment in $\mathcal{S}_{v_{p}}$ hit by $\rho^{-}$ ). Finally, for any child $v^{\prime\prime}$ of $v_{-}$ or $v_{+}$ whose $y$ -range is between the range of $v_{-}$ and $v_{+}$ , by definition of $v+$ , there is no segment in $S_{v^{\prime\prime}}$ hit by $\rho^{+}$ .

Recall that by the induction hypothesis $\sigma_{+}$ at the end of the previous step was the first segment hit by $\rho^{+}$ among segments of $\mathcal{L}-(S_{v_{+}}\cup S_{v_{+}})$ . Now we updated $\sigma_{+}$ and showed that there is no segment hit by $\rho^{+}$ before $\sigma_{+}$ in any subtree other than $\mathcal{T}_{v_{s}}$ or $\mathcal{T}_{v_{p}}$ . We conclude that $\sigma$ is the first segment hit by $\rho^{+}$ among the segments in $\mathcal{L}-(S_{v_{s}}\cup S_{v_{p}})$ . Since at the end of the $(i+1)$ th step we set $v_{-}=v_{p}$ and $v_{+}=v_{s}$ , the lemma follows.

∎

We now explain how Lemma A.1 implies the correctness of the query algorithm. To see that, let $i$ be the last level where either $v_{+}$ or $v_{-}$ is defined; at the beginning of the query algorithm at level $i$ , $\sigma_{+}$ is the first segment hit by $\rho^{+}$ among segments of $\mathcal{L}-(S_{v_{-}}\cup S_{v_{+}})$ . Moreover at the end of this step, both $v_{s}$ and $v_{p}$ are not defined, i.e., for each child $v$ of $v_{-}$ or $v_{+}$ there is no segment in $S_{v}$ hit by $\rho^{+}$ before $\sigma_{+}$ . Since $S_{v_{-}}\cup S_{v_{+}}=\mathcal{S}_{v_{-}}\cup\mathcal{S}_{v_{+}}\cup(\cup_{v}S_{v})$ , we get that $\sigma_{+}$ is the first segment hit by $\rho^{+}$ among segments of $\mathcal{L}-(\mathcal{S}_{v_{-}}\cup\mathcal{S}_{v_{+}})$ . By checking all segments of $\mathcal{S}_{v_{-}}\cup\mathcal{S}_{v_{+}}$ and updating $\sigma_{+}$ if necessary, we make sure that $\sigma_{+}$ is the first segment hit by $\rho^{+}$ among segments of $\mathcal{L}$ .

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Pankaj K. Agarwal, Lars Arge, Gerth Stølting Brodal, and Jeffrey Scott Vitter. I/o-efficient dynamic point location in monotone planar subdivisions. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 11–20, 1999.
2[2] Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Commun. ACM , 31(9):1116–1127, 1988.
3[3] Lars Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica , 37(1):1–24, 2003.
4[4] Lars Arge, Gerth Stølting Brodal, and Loukas Georgiadis. Improved dynamic planar point location. In 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS) , pages 305–314, 2006.
5[5] Lars Arge, Gerth Stølting Brodal, and S. Srinivasa Rao. External memory planar point location with logarithmic updates. Algorithmica , 63(1-2):457–475, 2012.
6[6] Lars Arge, Vasilis Samoladas, and Jeffrey Scott Vitter. On two-dimensional indexability and optimal range search indexing. In PODS , pages 346–357. ACM Press, 1999.
7[7] Lars Arge and Jan Vahrenhold. I/o-efficient dynamic planar point location. Comput. Geom. , 29(2):147–162, 2004.
8[8] Lars Arge, Darren Erik Vengroff, and Jeffrey Scott Vitter. External-memory algorithms for processing line segments in geographic information systems. Algorithmica , 47(1):1–25, 2007.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

External Memory Planar Point Location with Fast Updates††thanks: This work was supported by the Fonds de la Recherche Scientifique-FNRS under Grant no MISU F 6001 1 and by NSF Grant CCF-1533564.

Abstract

1 Introduction

1.1 Previous Work

RAM Model.

External Memory model

Fast Updates in External Memory.

1.2 Our Results

Theorem 1.1** (Main result).**

Theorem 1.2** (Left/right structure).**

Theorem 1.3** (Multislab structure).**

1.3 Notation and Preliminaries

External Memory Model.

Ray-shooting Queries.

BϵB^{\epsilon}Bϵ-Trees.

2 Overall Structure

The Data Structure.

Construction and Space Usage.

Queries.

Insertions.

Deletions.

3 Left and Right Structures

Total Order.

The Data Structure.

Construction and Space Usage.

Queries in the static structure.

Insertions.

Deletions.

Queries in the dynamic structure.

4 Multislab Structure

Total Order.

The Data Structure.

Construction and Space Usage.

Insertions.

Deletions.

Queries.

5 Counting the Restructuring Cost

Our Approach.

Splitting a node.

Updates in secondary structures of p(v)p(v)p(v).

Construct secondary structures for v1v_{1}v1​ and v2v_{2}v2​.

6 Concluding Remarks

Appendix A Queries in the Left and Right Structures.

Queries.

Lemma A.1**.**

Proof.

Theorem 1.1 (Main result).

Theorem 1.2 (Left/right structure).

Theorem 1.3 (Multislab structure).

$B^{\epsilon}$ -Trees.

Updates in secondary structures of $p(v)$ .

Construct secondary structures for $v_{1}$ and $v_{2}$ .

Lemma A.1.