Succinct Partial Sums and Fenwick Trees
Philip Bille, Anders Roy Christiansen, Nicola Prezza, Frederik Rye, Skjoldjensen

TL;DR
This paper introduces two space-efficient Fenwick Tree variants that support partial sums and updates with near-optimal space and time complexity, leveraging bit-packing and sampling techniques for practicality and parallelization.
Contribution
The paper presents the first succinct Fenwick Tree implementations with nk + o(n) bits space and efficient query/update times, improving space efficiency while maintaining performance.
Findings
Achieved nk + o(n) bits space with O(log_b n) query/update time.
Supported near-optimal time complexity with only a slight increase in space usage.
Designed methods are practical, based on bit-packing and sampling, enabling simple parallelization.
Abstract
We consider the well-studied partial sums problem in succint space where one is to maintain an array of n k-bit integers subject to updates such that partial sums queries can be efficiently answered. We present two succint versions of the Fenwick Tree - which is known for its simplicity and practicality. Our results hold in the encoding model where one is allowed to reuse the space from the input data. Our main result is the first that only requires nk + o(n) bits of space while still supporting sum/update in O(log_b n) / O(b log_b n) time where 2 <= b <= log^O(1) n. The second result shows how optimal time for sum/update can be achieved while only slightly increasing the space usage to nk + o(nk) bits. Beyond Fenwick Trees, the results are primarily based on bit-packing and sampling - making them very practical - and they also allow for simple optimal parallelization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Cellular Automata and Applications · DNA and Biological Computing
11institutetext: Technical University of Denmark, DTU Compute, Kgs. Lyngby, Denmark
11email: {phbi,aroy,npre,fskj}@dtu.dk
Succinct Partial Sums and Fenwick Trees
Philip Bille
Anders Roy Christiansen
Nicola Prezza
Frederik Rye Skjoldjensen
Abstract
We consider the well-studied partial sums problem in succint space where one is to maintain an array of -bit integers subject to updates such that partial sums queries can be efficiently answered. We present two succint versions of the Fenwick Tree – which is known for its simplicity and practicality. Our results hold in the encoding model where one is allowed to reuse the space from the input data. Our main result is the first that only requires bits of space while still supporting sum/update in / time where . The second result shows how optimal time for sum/update can be achieved while only slightly increasing the space usage to bits. Beyond Fenwick Trees, the results are primarily based on bit-packing and sampling – making them very practical – and they also allow for simple optimal parallelization.
Keywords:
Partial sums, Fenwick tree, succinct, parallel
1 Introduction
Let be an array of -bits integers, with . The partial sums problem is to build a data structure maintaining under the following operations.
- •
sum: return the value .
- •
search: return the smallest such that sum.
- •
update: set , for some such that .
- •
access: return .
Note that access can implemented as sumsum and we therefore often do not mention it explicitly.
The partial sums problem is one the most well-studied data structure problems [4, 9, 3, 1, 2, 8, 6, 7]. In this paper, we consider solutions to the partial sums problem that are succinct, that is, we are interested in data structures that use space close to the information-theoretic lower bound of bits. We distinguish between encoding data structures and indexing data structures. Indexing data structures are required to store the input array verbatim along with additional information to support the queries, whereas encoding data structures have to support operations without consulting the input array.
In the indexing model Raman et al. [8] gave a data structure that supports sum, update, and search in time while using bits of space. This was improved and generalized by Hon et al. [6]. Both of these papers have the constrain . The above time complexity is nearly optimal by a lower bound of Patrascu and Demaine [7] who showed that sum, search, and update operations takes time per operation, where is the word size and is the number of bits needed to represent . In particular, whenever this bound matches the bound of Raman et al. [8].
Fenwick [2] presented a simple, elegant, and very practical encoding data structure. The idea is to replace entries in the input array with partial sums that cover in an implicit complete binary tree structure. The operations are then implemented by accessing at most entries in the array. The Fenwick tree uses bits and supports all operations in time. In this paper we show two succinct -ary versions of the Fenwick tree. In the first version we reduce the size of the Fenwick tree while improving the sum and update time. In the second version we obtain optimal times for sum and update without using more space than the previous best succinct solutions [8, 6]. All results in this paper are in the RAM model.
Our results
We show two encoding data structures that gives the following results.
Theorem 1.1
We can replace with a succinct Fenwick tree of bits supporting sum, update, and search queries in , , and time, respectively, for any .
Theorem 1.2
We can replace with a succinct Fenwick tree of bits supporting sum and update queries in optimal time and search queries in time.
2 Data structure
For simplicity, assume that is a power of . The Fenwick tree is an implicit data structure replacing a word-array as follows:
Definition 1
Fenwick tree of [2]. If , then leave unchanged. Otherwise, divide in consecutive non-overlapping blocks of two elements each and replace the second element of each block with , for . Then, recurse on the sub-array .
To answer , the idea is to write in binary as for some . Then there are entries in the Fenwick tree, that can be easily computed from , whose values added together yield . In Section 2.1 we describe in detail how to perform such accesses. As per the above definition, the Fenwick tree is an array of indices. If represented compactly, this array can be stored in bits. In this section we present a generalization of Fenwick trees taking only succinct space.
2.1 Layered b-ary structure
We first observe that it is easy to generalize Fenwick trees to be -ary, for : we divide in blocks of integers each, replace the first elements in each block with their partial sum, and fill the remaining entries of by recursing on the array of size that stores the sums of each block. This generalization gives an array of indices supporting sum, update, and search queries on the original array in , , and time, respectively. We now show how to reduce the space of this array.
Let . We represent our -ary Fenwick tree using arrays (layers) . For simplicity, we assume that for some (the general case is then straightforward to derive). To improve readability, we define our layered structure for the special case , and then sketch how to extend it to the general case . Our layered structure is defined as follows. If , then . Otherwise:
- •
, for all . Note that contains elements.
- •
Divide in blocks of elements each, and build an array containing the sums of each block, i.e. , for . Then, the next layers are recursively defined as .
For general , is an array of elements that stores the partial sums of each block of consecutive elements in , while is an array of size containing the complete sums of each block. In Figure 1 we report an example of our layered structure with . It follows that elements of , for , take at most bits each. Note that arrays can easily be packed contiguously in a word array while preserving constant-time access to each of them. This saves us words that would otherwise be needed to store pointers to the arrays. Let be the space (in bits) taken by our layered structure. This function satisfies the recurrence
[TABLE]
Which unfolds to Using the identities and , one can easily derive that .
We now show how to obtain the time bounds stated in Theorem 1.1. In the next section, we reduce the space of the structure without affecting query times.
Answering sum
Let the notation , with for , represent the number in base . queries on our structure are a generalization (in base ) of queries on standard Fenwick trees. Consider the base- representation of , i.e. (note that we have at most digits since we enumerate indexes starting from 1). Consider now all the positions such that , for . The idea is that each of these positions can be used to compute an offset in . Then, . The offset relative to the -th most significant (nonzero) digit of is defined as follows. If , then . Otherwise, . Note that we scale by a factor of (and not ) as the first term in this formula as each level stores only out of partial sums (the remaining sums are passed to level ). Note moreover that each can be easily computed in constant time and independently from the other offsets with the aid of modular arithmetic. It follows that sum queries are answered in time. See Figure 1 for a concrete example of sum.
Answering update
The idea for performing is analogous to that of . We access all levels that contain a partial sum covering position and update at most sums per level. Using the same notation as above, for each such that , we update for . This procedure takes time.
Answering search
To answer we start from and simply perform a top-down traversal of the implicit B-tree of degree defined by the layered structure. At each level, we perform steps of binary search to find the new offset in the next level. There are levels, so search takes overall time.
2.2 Sampling
Let be a sample rate, where for simplicity we assume that divides . Given our input array , we derive an array of elements containing the sums of groups of adjacent elements in , i.e. , . We then compact by removing for , and by packing the remaining integers in at most bits. We build our layered -ary Fenwick tree over . It is clear that queries on can be solved with a query on followed by at most accesses on (the compacted) . The space of the resulting data structure is bits. In order to retain the same query times of our basic layered structure, we choose for any constant and obtain a space occupancy of bits. For , this space is bits. Note that—as opposed to existing succinct solutions—the low-order term does not depend on .
3 Optimal-time sum and update
In this section we show how to obtain optimal running times for sum and update queries in the RAM model. We can directly apply the word-packing techniques described in [7] to speed-up queries; here we only sketch this strategy, see [7] for full details. Let us describe the idea on the structure of Section 2.1, and then plug in sampling to reduce space usage. We divide arrays in blocks of entries, and store one word ( bits) for each such block. We can pack integers of at most bits each (for an opportune , read below) in the word associated with each block. Since blocks of integers fit in a single word, we can easily answer sum and update queries on them in constant time. sum queries on our overall structure can be answered as described in Section 2.1, except that now we also need to access one of the packed integers at each level to correct the value read from . To answer update queries, the idea is to perform update operations on the packed blocks of integers in constant time exploiting bit-parallelism instead of updating at most values of . At each update operation, we transfer one of these integers on (in a cyclic fashion) to avoid overflowing and to achieve worst-case performance. Note that each packed integer is increased by at most for at most times before being transferred to , so we get the constraint . We choose . Then, it is easy to show that the above constraint is satisfied. The number of levels becomes . Since we spend constant time per level, this is also the worst-case time needed to answer sum and update queries on our structure. To analyze space usage we use the corrected formula
[TABLE]
yielding . Replacing we achieve bits of space.
We now apply the sampling technique of Section 2.2 with a slight variation. In order to get the claimed space/time bounds, we need to further apply bit-parallelism techniques on the packed integers stored in : using techniques from [5], we can answer sum, search, and update queries in time on blocks of integers. It follows that we can now use sample rate without affecting query times. After sampling and building the Fenwick tree above described over the sums of size- blocks of , the overall space is . Note that , so and space simplifies to . The term equals . Since , then , and this term therefore simplifies to . Finally, the term equals . The bounds of Theorem 1.2 follow.
3.0.1 Parallelism
Note that sum and update queries on our succinct Fenwick trees can be naturally parallelized as all accesses/updates on the levels can be performed independently from each other. For sum, we need further time to perform a parallel sum of the partial results. It is not hard to show that—on architectures with processors—this reduces sum/update times to / and / in Theorems 1.1 and 1.2, respectively.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Dietz, P.F.: Optimal algorithms for list indexing and subset rank. In: Proc. 1st WADS. pp. 39–46 (1989)
- 2[2] Fenwick, P.M.: A new data structure for cumulative frequency tables. Software: Practice and Experience 24(3), 327–336 (1994)
- 3[3] Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In: Proc. 21st STOC. pp. 345–354 (1989)
- 4[4] Fredman, M.L.: The complexity of maintaining an array and computing its partial sums. Journal of the ACM (JACM) 29(1), 250–260 (1982)
- 5[5] Hagerup, T.: Sorting and searching on the word ram. In: STACS 98. pp. 366–398. Springer (1998)
- 6[6] Hon, W.K., Sadakane, K., Sung, W.K.: Succinct data structures for searchable partial sums with optimal worst-case performance. Theoretical Computer Science 412(39), 5176–5186 (2011)
- 7[7] Patrascu, M., Demaine, E.D.: Logarithmic lower bounds in the cell-probe model. SIAM Journal on Computing 35(4), 932–963 (2006)
- 8[8] Raman, R., Raman, V., Rao, S.S.: Succinct dynamic data structures. In: Workshop on Algorithms and Data Structures. pp. 426–437. Springer (2001)
