On-Line Balancing of Random Inputs
Nikhil Bansal, Joel H. Spencer

TL;DR
This paper introduces an online strategy for vector balancing with random inputs that achieves an $O( oot n)$ bound on the maximum coordinate sum, matching the best possible even with full knowledge of the vectors.
Contribution
The authors develop an online sign assignment method for random vectors that attains near-optimal bounds, advancing understanding of online vector balancing.
Findings
Achieves $O( oot n)$ bound with high probability
Optimal up to constant factors for random vectors
Provides a strategy matching offline best possible bounds
Abstract
We consider an online vector balancing game where vectors , chosen uniformly at random in , arrive over time and a sign must be picked immediately upon the arrival of . The goal is to minimize the norm of the signed sum . We give an online strategy for picking the signs that has value with high probability. Up to constants, this is the best possible even when the vectors are given in advance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On-Line Balancing of Random Inputs
Nikhil Bansal CWI and TU Eindhoven, Netherlands. [email protected]. Supported by a NWO Vici grant 639.023.812 and an ERC consolidator grant 617951.
Joel H. Spencer Courant Institute, New York University. [email protected].
Abstract
We consider an online vector balancing game where vectors , chosen uniformly at random in , arrive over time and a sign must be picked immediately upon the arrival of . The goal is to minimize the norm of the signed sum . We give an online strategy for picking the signs that has value with high probability. Up to constants, this is the best possible even when the vectors are given in advance.
1 Introduction
A random set of vectors is sent to our hero, Carole. The vectors are each uniform among the vectors with coordinates , and they are mutually independent. Carole’s mission is to balance the vectors into two nearly equal groups. To that end she assigns to each vector a sign . Critically, the signs have to be determined on-line – Carole has seen only vectors when she determines sign . Set
[TABLE]
Carole’s goal is to keep all of the coordinates of small in absolute value. We set , the norm of . We consider the value of this (solitaire) game, which Carole tries to minimize.
As our main result, we give a simple algorithm for Carole (with somewhat less simple analysis!) such that with high probability. Here is an absolute constant which we do not attempt to optimize.
To give a feeling, imagine Carole simply selected uniformly and independently, not looking at . Then each coordinate of would have distribution , roughly , with a standard normal. For, say, , the great preponderance of the coordinates would lie in . However, there would be a small but positive proportion of outliers, coordinates not lying in that interval. Indeed, the largest coordinate, with high probability, would be . Carole’s task, from this vantagepoint, is to avoid outliers.
More generally, we define where the vectors are in and there are rounds. Let be arbitrary. In particular, think of as very large. Again, if Carole simply selected uniformly and independently, then each coordinate would be distributed as roughly times the standard normal. So the largest coordinate, with high probability, would be . We extend our algorithm above to give an algorithm for the arbitrary time horizon, which guarantees that for any time , with probability exponentially close to . This is considered in Section 3.3.
1.1 Four Discrepancies
Paul, our villian, sends to Carole. Carole balances with signs . The value of this now two-player game is with as above. There are four variants. Paul can be an adversary (trying to make large) or can play randomly (as above). Carole can play on-line (as above) or off-line – waiting to see all before deciding on the signs . All of the variants are interesting.
Paul adversarial, Carole offline. Here . This was first shown by the senior author [8] and the first algorithmic strategy (for Carole) was given by the junior author [2].
Paul random, Carole offline. Here . In recent work [1], a value such that (with high probability) was conjectured with strong partial results.
Paul adversarial, Carole online. Here . These results may be found in the senior author’s monograph [9]. Up to constants, Carole can do no better than playing randomly. It was this result that made our current result a bit surprising.
Paul random, Carole online. , the object of our current work.
The round setting is also very interesting. If Paul picks vectors adversarially, and Carole plays online, then no better bound is possible than exponential in [4]. Basically, all Carole can do is alternate signs when one of the possible vectors is repeated.
1.2 Alternate Formulations
We return to our focus, the random online case. We find it useful to consider the problem in a variety of guises.
Consider an -round (solitaire) game with a position vector . Initially . On each round a random is given. Carole must then reset either or . The value of the game is with the position vector after the rounds have been completed.
Chip game. Consider chips on , all initially at [math]. Each round each chip selects a random direction. Carole then either moves all of the chips in their selected direction or moves all of the chips in the opposite of their selected direction. After rounds the value is the longest distance from the origin to a chip. (Here chip at position represents that the -th coordinate of is .)
Folded chip game. Consider chips on the non-negative integers, initially all at [math]. The rules are as above except that a chip at position [math] can only go to in the next step. Here the chip position is the absolute value of its position in the previous formulation. Even though the folded chip game is not exactly the same as the chip game above, the distributions produced on the absolute value of the positions in the two games are identical, which is all that we will need.
1.3 Erdős
Historically, discrepancy was examined for families of sets. Let be a set system with and a collection of subsets of . For a two-coloring , the discrepancy of a set is defined as , and measures the imbalance from an even split of . The discrepancy of the system is defined as
[TABLE]
That is, it is the minimum imbalance of all sets in over all possible two-colorings . Erdős famously asked for the maximal possible over all such set systems. It was in this formulation that the senior author first showed that .
Consider the incidence matrix for the set system . That is, set if , otherwise . Let denote the column vectors of . The coloring corresponds to the choice of . Then measures the maximal imbalance of the coloring. The set-system problem is then essentially the Adversarial, Off-Line Paul/Carole game. The distinction is only that the coordinates of the are instead of .
2 Carole’s Algorithm
The time will be indexed . Initially . In round , a random arrives and Carole resets . Let denote the vector after the -th round. Let denote the -th coordinate of .
The algorithm will be based on a potential function and depend on variables . We shall want with high probability, and the potential will penalize coordinates with discrepancy close to . Here will be a large constant as specified later, and will be a positive integer central to the algorithm. We may take and to be specific. However, we use the variables and in the analysis until the end to understand the various dependencies among the parameters.
Define the gap for coordinate as
[TABLE]
The algorithm will, with high probability, keep all so that the gaps are positive. Let
[TABLE]
and define the potential function
[TABLE]
As for all , . Note that the potential blows up whenever the discrepancy for any coordinate approaches . The factor provides a convenient normalization. When all , .
The algorithm is simple. On the -th round, seeing , Carole selects the sign , that minimizes the increase in the potential .
We remark that while potential function analyses are widely used in the design and analysis of random processes and algorithms, the inverse polynomial potential function considered above is motivated by the work of Batson, Spielman and Srivastava on graph sparsification [5]. In the context of discrepancy, a similar potential was used by the authors [3], and in an unpublished work of Yin Tat Lee and Mohit Singh to design offline algorithms.
2.1 Rough Analysis
Lets imagine all the as positive and near the boundary . The gap basically acts like
[TABLE]
Let be the potential values using this cleaner gap function. Suppose all . Then and . Set and consider the change ( large) when is incremented or decremented by one. From Taylor Series we approximate
[TABLE]
ignoring the higher order terms. Consider the change in when a random vector is added. We break it into a linear part and a quadratic part . We compare their sizes using (2.7). The quadratic part is always positive, for each term , adding up to . The linear part is for each term . As the vector (critically!) is random the signs are random and so add to distribution roughly , standard normal. Thus . Carole’s sign selection, effectively, replaces with . The change in is then proportional to . With probability at least , say, . After fixing and , will be of the order of while will be on the order of . For large enough, the linear term will be much bigger than the positive quadratic term .
Now lets keep the total potential fixed but suppose that some of the gaps were smaller and the other gaps had zero effect on the total potential. Say, giving a good parametrization, that for values of (As the potential takes to power , the total potential will remain the same.) Again we break the change in into and . We think of as fixed and consider the effect of . The quadratic terms are now for each term, an extra factor of . But the number of terms is so the new value is . The linear terms are now for each term, an extra factor of . Now, however, we sum random signs, giving . Compared to the base case the quadratic term has been multiplied by while the linear term has been multiplied by . We’ve taken so these factors are and respectively. As gets bigger the domination of over becomes stronger. This gives us “extra room” and works even if only a proportion of the potential function came from these .
In the actual analysis the total potential is in a prescribed moderate range. However, we cannot assume that all of the potential comes from some coordinates with the same gaps. We split the coordinates into classes, those in the same class having roughly the same value. We find some class that has so much of the total potential that will dominate over . Making all this precise is the object of Lemma 2.2 below.
2.2 Analysis
We will show the following result.
Theorem 2.1**.**
The strategy above achieves value , with probability at least , where .
The potential starts initially at . Let . We consider the situation when the potential lies between and . (The value could be any sufficiently large constant.) We will show that if , then at any step the potential can increase by at most . More importantly, whenever , the sign for the vector at time can be chosen so that there is a strong negative drift that more than offsets the increase. More formally, we can decompose the rise in potential into a linear part and some quadratic part , satisfying the following properties.
Lemma 2.2**.**
Consider time . The increase in potential is a random variable (depending on the randomness in column ) that can be written as , where
* with probability , whenever .* 2. 2.
* with probability at least , whenever .*
Lemma 2.2 will directly imply Theorem 2.1. Note that the algorithm and the random arrival process defines a Markov chain on the state space on integer-valued vectors. Moreover, the potential defines a Lyapunov function that maps each state to some real number. For our purposes, it suffices to consider the following simplified version of a much more general result due to Hajek [7] on hitting probability for Markov processes with a suitable Lyapunov function.
Theorem 2.3**.**
Let be a Lyapunov function for a Markov chain defined on a countable state space. For an interval , suppose the following holds: (i) the positive increments satisfy whenever and (ii) , whenever . Then for any time ,
[TABLE]
By the two properties of Lemma 2.2, and noting that the interval has size , and the positive increment is bounded by , Theorem 2.1 follows directly by applying Theorem 2.3 with and , .
Proving Lemma 2.2.
In the rest of the section, we prove Lemma 2.2. We begin by computing the relevant quantities. At time step , for , let denote the sign chosen for . For , let denote the -th coordinate of , and the discrepancy for the -th coordinate at the end of step . We initialize for all . Then,
[TABLE]
and note that .
Throughout we will condition on the event that . This will give us a useful separation, that the discrepancy , for any , is not too close to . Indeed, if , then for each . By (2.4), this implies . By (2.3),
[TABLE]
which implies that , using that .
We now upper bound the increase in potential, . Let us consider the function with domain . Then , and
[TABLE]
For any smooth function , recall that
[TABLE]
If satisfies , it is easily checked that whenever . Using the expression for and the bound on in (2.9), we have that for and satisfying ,
[TABLE]
Setting and gives and
[TABLE]
where
[TABLE]
As we will only be interested in time , henceforth we drop for notational convenience. In particular, we denote , , and . Let and .
Summarizing, if , then we have that , where
[TABLE]
We now focus on proving bound on and in Lemma 2.2.
Notation.
Let . For we say that coordinate lies in class if
[TABLE]
or equivalently .
Let denote the number of coordinates in class . As for in class , we have , and hence by (2.13) can be upper bounded as,
[TABLE]
We also have the following useful bounds.
Lemma 2.4**.**
If , then
For each class , . 2. 2.
.
Proof.
As and for each in class , we have that
[TABLE]
As , each class contributes at most , which gives .
We now bound . Let be the maximum class index for which . As , we have .
Plugging in the bound for in (2.14) gives
[TABLE]
where we use that and . ∎
We now focus on lower bounding , when . Recall that , and hence is a weighted sum of random variables . We will call , the weight of . We will use the following fact from [6].
Lemma 2.5**.**
Let all have absolute value at least . Consider the signed sums for . The number of sums that lie in any interval of length is maximized when all the and the interval is . In particular, taking for a small constant , the sums lie in only a small fraction of the time.
We use this as follows to show that the probability that , for , is small. Consider the indices where the weights lies in (suitably chosen) weight class, and fix the signs outside that class. Then for any values of signs outside that class, the signs in the class that will put the total sum in is bounded by the probability in the lemma above.
We now do the computations.
Claim 2.6**.**
For a coordinate of class , the weight is at least .
Proof.
This follows as , and for any class , , which is at least as , and . ∎
By Lemma 2.5 and Claim 2.6, to show that with a constant probability, it would suffice to show that there is some class such that
[TABLE]
Note that only classes are considered in Claim 2.6, while also has terms from class [math], so we need a final technical lemma to show that this contribution from class [math] can be ignored.
Lemma 2.7**.**
If , the contribution of class [math] coordinates to is at most .
Proof.
As for a class [math] coordinate, and there are at most such coordinates, the contribution of class [math] to is at most . So to prove the claim, it suffices to show that .
As for a coordinate of class , we have
[TABLE]
which gives . Using this together with for in class and in the expression for in (2.13), we get
[TABLE]
where the last equality uses our choice of . ∎
By (2.14) and the lemma above, to prove (2.17) it suffices to show that
Lemma 2.8**.**
There is some class such that
[TABLE]
Proof.
Let , and note that by Lemma 2.4, for all . Writing in terms of , we need to show that there is some satisfying
[TABLE]
Let , and let . Then for all , and hence . So the term on the right hand side of (2.21) is at most
[TABLE]
Next, as , the left hand side of (2.21) is at least , where the inequality follows as for all . So by (2.21), choosing finishes the proof. ∎
3 Arbitrary time horizon
We now consider the round setting, where can be arbitrarily large compared to . In particular, a uniformly chosen vector arrives at time , and Carole then selects a sign . As previously, , and the value after rounds is .
We will assume that is fixed in advance by Paul (and is not known to Carole). In particular, if can be chosen adaptively by Paul depending on Carole’s play, then the problem is not very interesting and the exponential in lower bound [4] for adversarial input vectors still holds. This is because even if the input vectors are random, after sufficiently long time (about ), some worst case adversarial sequence against any online strategy will eventually arrive, leading to worst case discrepancy .
Our main result is a strategy for Carole, described in Section 3.3, that achieves with high probability. Before proving this result, we describe two strategies that achieve a weaker (but still independent of ) bound of . These are very natural and interesting on their own with simple analysis and are discussed in Sections 3.1 and 3.2.
3.1 Strategy 1
The first strategy is based on a potential function approach as before, but with an exponential penalty function. This has the drawback of losing an extra factor, but has the advantage that the potential has a negative drift whenever it exceeds a certain threshold (without requiring an upper bound on that we needed in Lemma 2.2). This allows us to bound the discrepancy for an arbitrary time horizon, as whenever the potential exceeds the thresholds the negative drift will bring it back quickly.
Strategy.
Consider a time step . As before, let be the discrepancy of the -th coordinate at the end of time . Consider the potential
[TABLE]
where and is a large constant greater than . As before, when presented with the vector , Carole chooses that minimizes the increase in potential, .
Analysis.
Let denote the -th coordinate of . As we will only consider the time , let us denote , , and .
By the Taylor expansion and as and , the increase in potential can be written as
[TABLE]
where the second step follows as for all and , , and so the higher order terms are negligible compared to the second order term.
Let be the linear term, and be the second term in (3.22) (note that ). Conveniently, is exactly .
As the algorithm chooses to have , it suffices to show the following key lemma.
Lemma 3.1**.**
If , then with probability at least .
Before proving the lemma we need the following anti-concentration estimate, see e.g., [10].
Lemma 3.2**.**
If , with independent and uniform in , and , then for any ,
[TABLE]
In particular, setting \Pr\big{[}|Y|\geq(\sum_{i}a_{i}^{2})^{1/2}/2\big{]}\geq 3/16\geq 1/10.
Proof (Lemma 3.1).
By Lemma 3.2, and using for all , with probability at least ,
[TABLE]
As for all , . So for , we get
[TABLE]
Together (3.24) and (3.23) give that
[TABLE]
Using and plugging , gives that . ∎
As , we have that the change in potential satisfies the following two properties: (i) and, (ii) setting large enough, by Lemma 3.1 gives that if , then with probability at least .
Setting , then this gives that as . Moreover, whenever , with probability at least ,
Applying Theorem 2.3 to with and , we get that for any time ,
[TABLE]
As for each , and , setting gives that with probability .
3.2 Strategy 2
Our second strategy is even simpler, and we call it the majority rule. For convenience, it is useful to think of the folded chip view of the game, as described in Section 1.2. In particular, there are chips, originally all at [math], the position of the -th chip being the absolute value of . From [math], a chip must go to . Each chip not at [math] picks a random direction, and Carole then either moves all of the chips in their selected direction or all in their opposite directions. So from a position , a chip can go to .
Majority rule strategy.
Consider the directions of the chips not at position zero. If there is a direction with strict majority, Carole chooses the sign that makes the majority of the chips not at zero move towards zero. Otherwise, in case of a tie, Carole picks randomly.
Analysis.
We will show the following.
Theorem 3.3**.**
The majority rule strategy achieves . More precisely, the probability that any chip has position at time is .
Proof.
Consider some time , and a chip that is at a non-zero position at the end of . We claim that chip basically does a random walk with drift towards zero.
Look at the other non-zero coordinates (other than ), and suppose there are of them. We consider two cases depending on whether is even or odd.
is even. Consider the random directions of the chips other than , as given by . If these directions are evenly split, which occurs with probability , then the majority direction is determined by and so chip goes towards the origin.
Else if the directions are not split evenly, then at least chips of these chips have one direction (and at most the other). So has no effect on the outcome of the majority rule, and as is random and independent of the other directions, chip moves randomly. 2. 2.
is odd. If strictly more than of the chips have one direction, then the sign of does not affect the majority outcome. So as above, the chip moves randomly.
Else, exactly chips have one direction (say ) and have . As the directions are random this happens with probability . Conditioned on this event, with probability , the direction of chip is also , in which case there is a strict majority for , and chip goes towards the origin. Else picks the direction with probability , resulting in an overall tie, in which case Carole (and hence chip ) moves randomly.
So in either case, each chip does a random walk on non-negative integers with a reflection at [math] and with drift at least towards the origin. That is, from [math] it goes to , and from it goes to with probability at least , and else to . So the stationary distribution at positions for this chip, is dominated by the stationary distribution for an (imaginary) chip that goes to with probability and to otherwise. This stationary distribution satisfies
[TABLE]
This has the solution
[TABLE]
and in particular,
[TABLE]
Taking , the probability of any particular chip being at or higher is so with probability all the chips are . So the value with high probability. ∎
3.3 A strategy with bound
We now describe a strategy that achieves with high probability. It will be based on combining the ideas from the strategy for from Section 2 (call this Rule 1) and the majority rule from Section 3.2 (call this Rule 2).
The strategy.
It is convenient to view the process as the chip game defined in Section 1.2. Now, chips will also be colored either green or red. Initially, all the chips begin at [math] and are colored green. Starting at , we do the following.
At (odd) time steps , choose the sign by applying Rule 1 on the green chips. 2. 2.
At (even) time steps , choose by applying Rule 2 on all the chips (one could do even better by applying Rule 2 on the red chips, but it is not necessary).
The color of the chips evolves as follows. When the potential (given by (2.5)) for Rule 1 exceeds , all the chips become red. When a red chip reaches 0, it becomes green.
Analysis.
We will show the following.
Theorem 3.4**.**
For any time , the strategy above achieves with probability exponentially close to .
Proof.
The result will follow from the following three simple observations, combined together with the properties of Rule 1 and Rule 2 that we proved earlier.
First, when Rule 1 is applied on the green chips, the red chips move randomly. This follows as for any red chip , the coordinate of is independent of the chosen sign (which only depends on for coordinates with green chips, and the positions of these green chips).
Second, if we apply a good strategy on a chip at alternate time steps, and choose the sign randomly at the other time steps then we still get a good strategy. In particular, for Rule 2 this halves the negative drift which makes no qualitative difference. For Rule 1, this halves the negative drift due to the term (while does not change), but this can be increased by any constant factor by modifying the parameters.
Third, when we calculate the potential to apply Rule on the green chips, we will assume (for the purposes of calculation of only) that the red chips are at position [math], and they do not move (that is for them) until they become green. Lemma 2.2 and hence Theorem 2.1 remain true in this setting, as can only decrease if some , and the bound for is not affected as we did not consider the contribution of class [math] in Lemma 2.8.
We now use these observations to finish the analysis. Let us divide the time into phases, where a new phase begins whenever the potential for Rule 1 on green chips reaches . Recall at this point, all the chips become red, and each chip stays red until it reaches [math]. Note that a chip can only turn red when a phase begins and it must be at position when this happens (green chips are always at positions as ).
The key point is that as the red chips have an expected drift toward zero under Rule 2 (and move randomly otherwise), the probability that a particular chip stays red for steps is . So, say, within time steps since a phase starts, all the chips will reach zero with probability exponentially close to . By the third observation above and Theorem 2.3, for any time , the probability that next phase begins in exactly steps from the start of current phase is . Together, this gives that for any fixed , the probability that there is any red chip present at will be exponentially close to [math]. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Benjamin Aubin, Will Perkins, and Lenka Zdeborová. Storage capacity in symmetric binary perceptrons. ar Xiv 1901.00314 , Jan 2019.
- 2[2] Nikhil Bansal. Constructive algorithms for discrepancy minimization. In Foundations of Computer Science (FOCS) , pages 3–10, 2010.
- 3[3] Nikhil Bansal and Joel Spencer. Deterministic discrepancy minimization. Algorithmica , 67(4):451–471, 2013.
- 4[4] Imre Bárány. On a class of balancing games. J. Comb. Theory, Ser. A , 26(2):115–126, 1979.
- 5[5] Joshua Batson, Daniel Spielman, and Nikhil Srivastava. Twice-Ramanujan sparsifiers. SIAM J. Comput. , 41(6):1704–1721, 2012.
- 6[6] Paul Erdős. On a theorem of littlewood and offord. Bull. Amer. Math. Soc. (2nd ser.) , 51:898–902, 1945.
- 7[7] Bruce Hajek. Hitting-time and occupation-time bounds implied by drift analysis with applications. Advances in Applied Probability , 14(3):502–525, 1982.
- 8[8] Joel Spencer. Six standard deviations suffice. Transactions of the American Mathematical Society , 289(2):679–706, 1985.
