Online Revenue Maximization for Server Pricing
Shant Boodaghians, Federico Fusco, Stefano Leonardi, Yishay Mansour,, Ruta Mehta

TL;DR
This paper develops an online posted-price mechanism for server resource pricing that maximizes revenue in a stochastic setting with unknown distributions, ensuring truthfulness and efficiency.
Contribution
It introduces a computationally efficient, revenue-optimal posted-price mechanism for online server pricing under uncertainty, with provable near-optimality from limited samples.
Findings
The mechanism achieves revenue optimality in expectation and retrospectively.
A polynomial number of samples suffices for near-optimal pricing.
Prices are deterministic and depend only on interval length and server availability.
Abstract
Efficient and truthful mechanisms to price resources on remote servers/machines has been the subject of much work in recent years due to the importance of the cloud market. This paper considers revenue maximization in the online stochastic setting with non-preemptive jobs and a unit capacity server. One agent/job arrives at every time step, with parameters drawn from an underlying unknown distribution. We design a posted-price mechanism which can be efficiently computed, and is revenue-optimal in expectation and in retrospect, up to additive error. The prices are posted prior to learning the agent's type, and the computed pricing scheme is deterministic, depending only on the length of the allotted time interval and on the earliest time the server is available. If the distribution of agent's type is only learned from observing the jobs that are executed, we prove that a polynomial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Online Revenue Maximization for Server Pricing
Shant Boodaghians
University of Illinois at Urbana-Champaign,
Urbana IL 61801, USA
&Federico Fusco
Department of Computer, Control
and Management Engineering
Sapienza University
Rome, Italy
[email protected] &Stefano Leonardi22footnotemark: 2
Department of Computer, Control
and Management Engineering
Sapienza University
Rome, Italy
[email protected] &Yishay Mansour
Tel Aviv University,
P.O. Box 39040, Tel Aviv 6997801, Israel
[email protected] &Ruta Mehta11footnotemark: 1
University of Illinois at Urbana-Champaign,
Urbana IL 61801, USA
[email protected] Partially supported by NSF grant CCF-1750436.Supported by ERC Advanced Grant 788893 AMDROMA “Algorithmic and Mechanism Design Research in Online Market” and MIUR PRIN project ALGADIMAR “Algorithms, Games, and Digital Markets”
Abstract
Efficient and truthful mechanisms to price resources on remote servers/machines has been the subject of much work in recent years due to the importance of the cloud market. This paper considers revenue maximization in the online stochastic setting with non-preemptive jobs and a unit capacity server. One agent/job arrives at every time step, with parameters drawn from an underlying distribution. We design a posted-price mechanism which can be efficiently computed, and is revenue-optimal in expectation and in retrospect, up to additive error. The prices are posted prior to learning the agent’s type, and the computed pricing scheme is deterministic, depending only on the length of the allotted time interval and on the earliest time the server is available. We also prove that the proposed pricing strategy is robust to imprecise knowledge of the job distribution and that a distribution learned from polynomially many samples is sufficient to obtain a near-optimal truthful pricing strategy.
1 Introduction
Designing mechanisms for a desired outcome with strategic and selfish agents is an extensively studied problem in economics, with classical work by Myerson [1], and Vickrey-Clarke-Groves [2], for truthful mechanisms. The advent of online interaction and e-commerce has added an efficiency constraint on the mechanisms, going so far as to prioritize computational efficiency over classical objectives: e.g. choosing simple approximate mechanisms when optimal mechanisms are computationally difficult, or impossible. Beginning with Nisan and Ronen [3], the theoretical computer science community has contributed greatly to the field, in both fundamental problems and specific applications. These include designing truthful mechanisms for the maximization of welfare and revenue, and has also focused on learning distributions of agent types, menu complexity, and dynamic mechanisms (e.g., [4, 5].)
We consider this question in the setting of selling computational resources on remote servers or machines (cf. [6, 7].) This is arguably one of the fastest growing markets on the Internet. The goods (resources) are assigned non-preemptively and thus have strong complementarities. Furthermore, since the supply (server capacity) is limited, any mechanism trades immediate revenue for future supply. Finally, mechanisms must be incentive-compatible, as non-truthful, strategic, behaviour from the agents can skew the performance of a mechanism from its theoretical guarantees. This leads us to the following question:
Question.
Can we design an efficient, truthful, and revenue-maximizing mechanism to sell time-slots non-preemptively on a single server?
We design a posted-price mechanism which maximizes expected revenue up to additive error, for agents/buyers arriving online, with parameters of value, length and maximum delay, drawn from an underlying distribution.
Three key aspects distinguish our problem from standard online scheduling: (i) In our setting, as time progresses, the server clears up, allowing longer jobs to be scheduled in the future if no smaller jobs are scheduled until then. (ii) Scheduling the jobs is not exclusively to the discretion of the mechanism designer, but must also be desired by the job itself, while also producing sufficient revenue. (iii) As the mechanism designer, we do not have access to job parameters in an incentive-compatible way before deciding on a posted price menu. These three features lie at the core of the difficulty of our problem. Our focus will be on devising online mechanisms in the Bayesian setting.
In our online model, time on the server is discrete. At every time step, an agent arrives on the server, with a value , length requirement , and maximum delay . These parameters are drawn from a common distribution, i.i.d. across jobs. The job wishes to be scheduled for at least consecutive time slots, no more than time units after its arrival, and wishes to pay no more than . Jobs are assumed to have quasi-linear utility in money, and so prefer the least-price interval within their constraints. The mechanism designer never learns the parameters of the job. Instead, she posts a price menu of (length,price) pairs, and the minimum available delay . The job accepts to be scheduled so long as , and there is some (length,price) pair in the menu of length at least and price at most . We note that the pricing scheme can be dynamic, changing through time. If, at time epoch , an agent chooses option , then she pays and her job will be allocated to the interval . She will choose the option which minimizes . Throughout this paper we assume that the random variables are discrete, and have finite support, unless specified differently.
1.1 Summary of Our Results
We model the problem of finding a revenue maximizing pricing strategy as a Markov Decision Process (MDP). Given a price menu (length,price) and a state (minimum available delay) at time , the probability of transition to any other state at time is obtained from the distribution of the job’s parameters. The revenue maximizing pricing strategy can be efficiently computed via backwards induction. We also present, in Appendix C.2, an approximation scheme in the case where is a continuous random variable. 2. 2.
We prove that the optimal pricing strategy is monotone in length under a distributional assumption, which we show is satisfied when the jobs’ valuation follows a log-concave distribution, parametrized by length. Recall that log-concave distributions are exactly those which have a monotone hazard rate. This implies the existence of an optimal pricing mechanism which ensures truthfulness in the finite horizon setting when the distributions are known. In Appendix C.1, this is extended to the infinite discounted horizon setting, incurring a small additive error. We also demonstrate good concentration bounds of the revenue obtained by the optimal truthful posted price strategy. 3. 3.
We finally investigate the robustness of the pricing strategy. We first show that a near optimal solution is still obtained when the distribution is known with a certain degree of uncertainty. We complement this result by analyzing the performances of the proposed pricing strategy when the distribution is only known from samples collected through the observations of the agents’ decisions. We provide a truthful posted price -approximate mechanism if the number of samples is polynomial in and the size of the support of the distribution.
1.2 Related Work
Much recent work has focused on designing efficient mechanisms for pricing cloud resources. Chawla et al. [8] recently studied “time-of-use” pricing mechanisms, to match demand to supply with deadlines and online arrivals. Their result assumes large-capacity servers, and seeks to maximize welfare in a setting in which the jobs arriving over time are not i.i.d.. [9] provides a mechanism for preemptive scheduling with deadlines, maximizing the total value of completed jobs. Another possible objective for the design of incentive-compatible scheduling mechanisms is the total value of completed jobs, which have release times and deadlines. [10] solves this problem in an online setting, while [11], in the offline setting for parallel machines, and [12], in the online competitive setting with uncertain supply. [13] focuses on social welfare maximization for non-preemptive scheduling on multiple servers, and obtains a constant competitive ratio as the number of servers increases. Our work differs from these by considering revenue maximization and stochastic job types which are i.i.d. over time. [14] addresses computing a price menu for revenue maximization with different machines. Finally, [7] proposes a system architecture for scheduling and pricing in cloud computing.
Posted price mechanisms (PPM) have been introduced by [15] and have gained attention due to their simplicity, robustness to collusion, and their ease of implementation in practice. One of the first theoretical results concerning PPM’s is an asymptotic comparison to classical single-parameter mechanisms [16]. They were later studied by [17] for the objective of revenue maximization, and further strengthened by [18] and [19]. [20] shows that sequential PPM’s can -approximate social welfare for XOS valuation functions, if the price for an item is equal to the expected contribution of the item to the social welfare.
Sample complexity for revenue maximization was recently been studied in [5] showing that a polynomially many of samples suffice to obtain near optimal Bayesian auction mechanisms. An approach based on statistical learning that allows to learn mechanisms with expected revenue arbitrarily close to optimal from a polynomial number of samples has been proposed in [21]. The problems of learning simple auctions from samples has been studied in [22].
1.3 Structure of the Paper
In Section 2 we describe the model of the problem as a Markov Decision Process. In Section 3 we present an efficient algorithm for computing optimal policies for the finite time horizon given full knowledge of the distribution of the jobs’ paramethers. This is extended to other settings in Appendix C. In Section 3.3, we demonstrate that the optimal policy is monotone and in Section 3.4 we describe the concentration bounds on the revenue of a pricing policy. Section 4.2 gives the learning algorithm and error bounds for computing the pricing policies with only (partial) sample access to the job distribution. In Finally, Section 4.3 and Section 5 are devoted to describing and summarizing the final result and future directions of research.
Proof details are provided in Appendix B.
2 Model
Notation.
In what follows, the variables , or , or , and or are reserved for describing the parameters of a job that wishes to be scheduled. Respectively, they represent the arrival time , required length , value , and maximum allowed delay . The lowercase variables represent fixed values, whereas the uppercase represent random variables. Script-uppercase letters represent the supports of the distributions on , , and , respectively; and the bold-uppercase letters represent the maximum values in these respective sets. Finally, is reserved for pricing policy, whereas is reserved for probabilities.
Single-Machine, Non-Preemptive, Job Scheduling.
A sequence of random jobs wish to be scheduled on a server, non-preemptively, for a sufficiently low price, within a time constraint. Formally, at every time step , a single job with parameters is drawn from an underlying distribution over the space . It wishes to be scheduled for a price in an interval such that and .
Price Menus.
Our goal is to design a take-it-or-leave-it, posted-price mechanism which maximizes expected revenue. At each time period, the mechanism posts a “price menu” and an earliest-available-time , indicating that times through have already been scheduled. ( will henceforth be referred to as the state of the server.) We let to be the set of all possible states. The state of the server at a given time is naturally a random variable which depends on the earlier jobs and on the adopted policy . As before, we will denote with or the fixed value, and with or the corresponding random variable. The price menu will be given by the function , i.e., if we are a time and the server is in state , then the prices are set according to The reported pair is computed by the scheduler’s strategy, which we determine in this paper. Once this is posted, a job is then sampled i.i.d. from the underlying distribution .
If for some , and , then the job accepts the schedule, and reports the length which minimize price. Otherwise, the job reports and is not scheduled. To guarantee truthfulness, it suffices to have be monotonically non-decreasing for every state : the agent would not want a longer interval since it costs more, and would not want one of the shorter intervals since they cannot run the job. It should be clear that the mechanism’s strategy is to always report monotone non-decreasing prices, as a decrease in the price menu will only cause more utilization of the server, without accruing more revenue. The main technical challenge in this paper, then, is to show that under some assumptions, the optimal strategy is monotone non-decreasing, and efficiently computable.
Revenue Objective.
Revenue can be measured in either a finite or an infinite discounted horizon. In the former (finite) case, only time periods will occur, and we seek to maximize the expected sum of revenue over these periods. In the infinite-horizon setting, future revenue is discounted, at an exponentially decaying rate. Formally, revenue at time is worth a fraction of revenue at time 0, for some fixed . See Appendix C.1. Recall that the job parameters are drawn independently at random from the underlying distribution, so the scheduler can only base their “price menu” on the state of the system and the current time. Thus, the only realistic strategy is to fix a state-and-time-dependent pricing policy , “”, where .
Let be the random sequence of jobs arriving, sampled i.i.d. from the underlying distribution. Let be the pricing policy. We denote as the revenue earned at time with policy and sequence . If does not buy, then , and otherwise, it is equal to . We denote as the total (cumulative) revenue earned over the periods. Thus,
[TABLE]
We will also need the expected-future-revenue, given a current time and server state, which we will denote as follows:
[TABLE]
The subscript of the expectation denotes that we consider only jobs arriving from time onward. Our objective is to find the pricing policy which maximizes . Call this , and denote the expected revenue under as .
3 Bayes-optimal Strategies for Sever Pricing
In this section we seek to compute an optimal monotone pricing policy which maximizes revenue in expectation over jobs sampled i.i.d. from an underlying known distribution . This is extended to the infinite-horizon, discounted, setting in Appendix C.1.
We first model the problem of maximizing the revenue in online server pricing as a Markov Decision Process that admits an efficiently-computable, optimal pricing strategy. The main contribution of this section is to show that, for a natural assumption on the distribution , the optimal policy is monotone. We recall that this allows us to derive truthful Bayes-optimal mechanisms.
3.1 Markov Decision Processes.
We show that the theory of Markov Decision Processes is well suited to model our problem. A Markov Decision Process is, in its essence, a Markov Chain whose transition probabilities depend on the action chosen at each state, and where to each transition is assigned a reward. A policy is then a function mapping states to actions. In our setting, the states are the states of the system outlined in Section 2 (i.e., the possible delays before the earliest available time on the server), and the actions are the “price menus.” At every state , a job of a random length arrives, and with some probability, chooses to be scheduled, given the choice of prices. The next state is either , if the job does not choose to be scheduled (since we have moved forward in time), or , if a job of length is scheduled, since we have occupied more units. The transition probabilities depend on the distribution of job lengths, and the probability that a job accepts to be scheduled given the pricing policy (action). Formally,
[TABLE]
(Transitions to state “” should be read as transitions to state “”.) Note that a job of length may choose to purchase an interval of length greater than , which would render these transition probabilities incorrect. However, this may only happen if the larger interval is more affordable. It is therefore in the scheduler’s interest to guarantee that in monotone non-decreasing in , which incentivizes truthfulness, since this increases the amount of server-time available, without affecting revenue. Thus we restrict ourselves to this case.
It remains to define the transition rewards. They are simply the revenue earned. Formally, a transition from state to incurs a reward of , whereas a transition from state to incurs 0 reward. We wish to compute a policy in such a way as to maximize the expected cumulative revenue, given as the (possibly discounted) sum of all transition rewards in expectation.
3.2 Solving for the Optimal Policy with Distributional Knowledge
In this section, we present a modified MDP whose optimal policies can be efficiently computed, and show that these policies are optimal for the original MDP. In this section, we assume that the mechanism designer is given access to the underlying distribution . However, in the following sections, we will show that if the distribution is estimated from samples, then solving for the MDP on this estimated distribution is sufficient to ensure sufficiently good revenue guarantees.
Since the problem has been modelled as a Markov Decision Process (MDP), we may rely on the wealth of literature available on MDP solutions, in particular we will leverage the backwards induction algorithm (BIA) of [23] Section 4.5, included in Appendix B as Algorithm 1. We will however need to ensure that this standard algorithm (i) runs efficiently, and (ii) returns a monotone pricing policy.
Note that past prices do not contribute to future revenue insofar as the current state remains unchanged. Thus, to compute optimal current prices, we need only know the current state and expected future revenue. This allows us to use the BIA. The idea is to compute the optimal time-dependent policy, and the incurred expected reward, for shorter horizons, then use this to recursively compute the optimal policies for longer horizons.
The total runtime of the BIA is , where and denote the action and state spaces, respectively. Note that the dependence on is unavoidable, since any optimal policy must be time-dependent. Recall that and denote the maximum values that and can take, respectively, and is the set of possible values that can take. Denote . If we define the action space naïvely, we have , and . Thus, a naïve definition of the MDP bounds the runtime at , which is far from efficient. Requiring monotonocity only affects lower-order terms.
Modified MDP.
To avoid this exponential dependence, we can be a little more clever about the definition of the state space: instead of states being the possible server states, we define our state space as possible (state, length) pairs. Thus, when the MDP is in state , the server is in state , and a job of length has been sampled from the distribution. Our action-space then is simply the possible values of , and the transition probabilities and rewards become:
[TABLE]
Therefore, we get , and . Thus, the runtime of the algorithm becomes . A full description of the procedure is given in Appendix B as Algorithm 2. It remains to prove that it is correct. We begin by claiming that these two MDPs are equivalent in the following sense:
Lemma 1**.**
For any fixed pricing policy ,
[TABLE]
where the ’s are as in (2), and the ’s are from the modified MDP.
(See Appendix B for a proof.) This lemma, however, does not suffice on its own, as agents may behave strategically by over-reporting their length, if the prices are not increasing. This would alter the transition probabilities, breaking the analysis. We will see that under a mild assumption, this can not happen, as the optimal policy for non-strategic agents will be monotone, and therefore truthful.
3.3 Monotonicity of the Optimal Pricing Policies
Recall that the solution of the more efficient MDP formulation is only correct if we can show that it is always monotone without considering the strategic behaviour of agents, ensuring incentive-compatibility of the optimum.
An optimal monotone strategy cannot be obtained for all the distributions on and . As an example, for any distribution where a job’s value is a deterministic function of their length, the optimal policy is to price-discriminate by length. If this function is not monotone, the optimum won’t be either. To this end, we introduce the following assumption, which we will discuss below, and which will imply monotonicity of the pricing policy.
Assumption 1.
The quantity is monotone non-decreasing as grows, for any state and fixed.
This is not a natural, or immediately intuitive assumption. However, we will show that it is satisfied if the valuation of jobs follows a log-concave distribution which is parametrized by the job’s length, and where the valuation is (informally) positively correlated with this length. Log-concave distributions are also commonly referred to as distributions possessing a monotone hazard rate, and it is common practice in economic settings to require this property of the agent valuations.
Lemma 2**.**
Let, denote the marginal r.v. conditioned on and . Let be a continuously-supported random variable, and . If is distributed like , , , or , then Assumption 1 is satisfied if is log-concave, or if the ’s are independent of .
A discussion of log-concave random variables and a proof of this fact is given in Appendix A. Many standard (discrete) distributions are (discrete) log-concave random variables, including the uniform, Gaussian, logistic, exponential, Poisson, binomial, etc. These can be proved to be log-concave from the discussion in Appendix A. In the above, the terms represent a notion of spread or shifting, parametrized by the length, indicating some amount of positive correlation.
It remains to show price monotonicity under the above assumption. First, we begin with the following, which holds for arbitrary distributions.
Lemma 3**.**
Let be the expected future revenue earned starting at time in state , for the optimal policy computed by Algorithm 2. Then the function is monotone non-increasing in for any fixed.
See Appendix B for the proof. This lemma ensures that over-selling time on the server can only hurt the mechanism. This allows us to conclude
Lemma 4**.**
If the distribution on job parameters satisfies the above assumption, then for all , we have .
Sketch..
A full proof may be found in Appendix B. The idea is to show that, for any price less than the optimum , the difference in revenue between charging and to jobs of length is less than the difference in revenue between the same prices for jobs of length . This is achieved by applying the assumption to recursive definition of future revenue, along with the previous lemma. Thus, we can conclude that the optimal price must be greater than . ∎
With Lemma 4 and the results of Appendix C, we finally have:
Theorem 5**.**
The online server pricing problem admits an optimal monotone pricing strategy when the variables , , and satisfy assumption 1. Also,
In the finite horizon setting, when is finitely supported, an exact optimum can be computed in time . 2. 2.
In the infinite horizon setting, when is finitely supported, for all , an -additive-approximate policy can be computed in time
[TABLE] 3. 3.
In the finite horizon setting, when is continuously supported, for all , an -additive-approximate policy can be computed in time .
3.4 Concentration Bounds on Revenue for Online Scheduling
In this section, we show that the revenue of arbitrary policies concentrates around their mean. In particular it holds true for the optimal or approximately optimal strategies described above. This will also allow us to argue later that, if we have an estimate of , then execute Algorithm 2 given the distribution , then the output policy will perform well with respect to , both in expectation, and with high probability. To show this concentration, we will consider the Doob or exposure martingale of the cumulative revenue function, introduced in Section 2. Define
[TABLE]
where the ’s are jobs in the sequence and the expected value is taken with respect to . Thus, is the expected cumulative revenue, and is the random cumulative revenue. To formally describe this martingale sequence, we will introduce some notation, and formalize some previous notation. Recall that is a sequence of jobs sampled i.i.d. from an underlying distribution . Fix a pricing policy . Note that the state at time is a random variable depending on both the (deterministic) pricing policy and the (random) . We denote it , or for short. Formally, suppose , then if either or , and otherwise . Furthermore, let be equal to 0 in the first case above (the -th job is not scheduled), and otherwise. Thus, and are functions of the random values for fixed. Note that implicitly depends on . Let and . Recalling that , we have
[TABLE]
We wish to show that concentrates around its mean. Since is the expected revenue due to , and is the (random) revenue observed, it suffices to show is small, which we will do by applying Azuma’s inequality, after showing the bounded-differences property. This gives, see Appendix B.3 for details,
Theorem 6**.**
*Let be a finite sequence of jobs sampled from , and let be any monotone policy. Then, with probability ,
[TABLE]
*in the finite horizon, and in the infinite-horizon-discounted,
[TABLE]
In particular these results hold true for the (approximately) optimal pricing strategies of Theorem 5.
4 Robustness of Pricing with Approximate Distributional Knowledge
In this section, we show that results analogous to Theorems 5 and 6 may be obtained even in the case in which we do not have full knowledge of the distribution , but only an estimate . We then show how to obtain a valid from samples.
4.1 Robustness of the pricing strategy
Let’s suppose that instead of knowing the exact distribution of the jobs, we have only access to some estimate with the following property, for some :
[TABLE]
For the sake of brevity, we abuse notation and denote the condition in as Later, we will need to estimate the value , given , that is the probability that the job has length , but either cannot afford price , or cannot be scheduled slots in the future. This is equal to \mathbb{P}[L=\ell]-\mathbb{P}[L=\ell,V\geq v,D\geq s]\.
The left-hand term is equal to , and so we have access to both terms. The estimation error is additive, so the deviation is at most .
Denote , and recall
[TABLE]
the expected revenue from time onwards, conditioning on . Let be the same as , but where the variables are distributed as . As before, let be for , the Bayes-optimal policy returned by Algorithm 2, and defined similarly but with respect to . We will show that is a good estimate for .
Lemma 7**.**
Let , and such that .
In the finite horizon, for all ; 2. 2.
In the infinite horizon, for all , where is the optimal time independent strategy.
The proof of 1 is in Appendix B.4, and the proof of 2 in Appendix C.1.
4.2 Learning the Underlying Distribution from Samples
As discussed above, we show here how to compute a from samples of , such that is small with high probability. In particular we present a sampling procedure which respects the rules of the pricing server mechanism. When a job arrives, we only learn its length, and only if it agrees to be scheduled. Thus, we are not given full samples of , complicating the learning procedure. Thanks to the previous section, we know that a policy which is optimal with respect to will be close-to-optimal with respect to .
We remark, however, that the power of the results of the previous section is not exhausted by this application: one may apply directly the robustness results to specific problems in which the is subject to (small) noise or an approximate distribution is already known from other sources.
Let be an i.i.d. sample of jobs from the underlying distribution . Note that the expectation of an indicator is the probability of the indicated event. Fix a length , a state , and a value . As a consequence of Höffding’s inequality, with probability ,
[TABLE]
Sampling Procedure.
We wish to estimate the value for all choices of , , and . Fixing and , we may repeatedly post prices and declare that the earliest available time is , then record (i) which job accepts to be scheduled, and (ii) the length of each scheduled job. Let and , then by (10), the sample-average of each value will have error at most with probability , for any one choice of .
Repeating this process for all choices of and gives us estimates for each. Now, if we want to have the estimate hold over all choices of , it suffices to take the union bound over all values (incl. ), and scaling accordingly. If we take samples for each of the choices of and , then simultaneously for all , , and , the quantity in (10) is at most . So we have obtained the “” condition. It should be noted that, for this sampling procedure, if a job of length is scheduled, we must possibly wait at most times units before taking the next sample to clear the buffer. This blows up the sampling time by a factor of . The following result follows immediately from Lemma 7 and Höffding’s inequality for the right choice of .
Lemma 8**.**
Let , , and , be as above. In the finite horizon, for all , if , we have that with probability , for all . In the infinite horizon, if , we have that with probability , for all .
4.3 Performance of the Computed Policy
We use here the result of the previous sections to analyze the performance of the policy output by Algorithm 2 after the learning procedure. By the estimation of revenue, the best policy in estimated-expectation is near-optimal in expectation. Since revenues from arbitrary policies concentrate, we get near-optimal revenue in hindsight.
Formally, for , Lemma 8 gives us that if the sample-distribution is computed on samples, then with probability over the samples, . Note that is exactly the expected cumulative revenue of the optimal policy. For clarity of notation, denote
[TABLE]
We have shown that for sufficient samples, , with probability . This observation allows us to then conclude
Theorem 9** (Finite Horizon).**
Let be the underlying distribution over jobs. Let , and . Then in time , we may compute a policy which is monotone in length, and therefore incentive compatible, such that for any policy , with probability ,
[TABLE]
Furthermore, if the distribution over values is continuous rather than discrete, we may compute in time a monotone policy such that for any policy , with probability ,
[TABLE]
Proof.
We have chosen . Let be the optimal policy for the true distribution . By Theorem 6, we have with probability for both and . Furthermore, by Lemma 8, with probability , for both and . This is because from the point of view of , is the true distribution, and is the estimate. Taking the union bound over all four events above, and recalling that maximizes , and maximizes , we get the following with probability :
[TABLE]
as desired.
When is continuously distributed, choose prices which are multiples of between 0 and , as is outlined in Appendix C.2. ∎
For what concerns the -discounted infinite horizon case, we have the following
Theorem 10** (Infinite Horizon, Discounted).**
Let be the underlying distribution over jobs. Let , and . Then we may compute a policy in time , which is monotone, and thus incentive compatible, such that for any policy , with probability ,
[TABLE]
Furthermore, if the distribution over values is continuous rather than discrete, we may compute in time a monotone policy such that for any , with probability ,
[TABLE]
As above, this policy is computed by learning from samples as in Section 4.2, then running the modified Algorithm 2 for the estimated distribution as in Appendix C.1. In case is continuously distributed, we restrict ourselves to prices which are multiples of between 0 and . The details of the proof are in Appendix C.
We recall that all these results need the distribution assumption from Section 3.3.
5 Conclusions and Future Work
In summary, we propose to price time on a server by first learning the distribution over jobs from samples, then computing the Bayes-optimal policy from the estimated distribution. Our learning algorithm is simple: we sample the distribution through the observation of jobs at artificially fixed prices and server-states, and learn the job parameters depending on whether they accept to be scheduled. Using these observations, we build an observed distribution . We then run Algorithm 2 with and compute an optimal policy for . We are guaranteed that the policy prices monotonically (due to Lemma 3), and therefore it is incentive compatible, which implies the correctness of the estimated revenue.
Future Work.
There are many natural extensions to this work. For example, one could consider a multi-server setting, settings where jobs can request to be scheduled later than the earliest available time, or settings where jobs need various quantities of differing resources, such as memory and computation time.
Appendix A Log-Concave Distributions
In Section 3.3, we sought to show that if the value of a random job has a log-concave distribution, then the optimal policy will be monotone. We present here a discussion of log-concavity, both for continuous and discrete random variables, and give the proof of the monotonicity of the prices.
Formally, a function is log-concave if for any and , and for any , . Equivalently, . For a discretely supported , we replace this condition with , emulating the continuous definition with . We further require that the support of be “connected”.
Definition 11**.**
A continuous random variable with density function is said to be log-concave if is log-concave. A discrete random variable with probability mass function is said to be log-concave if is discretely log-concave.
A well-known fact is that log-concave random variables also have log-concave cumulative density/mass functions. We present here a quick proof of this fact, for completeness.
Claim 12**.**
If is a log-concave continuous r.v., then , and are log-concave functions of . If is a log-concave discrete r.v. supported on , then and are discretely log-concave functions of .
Proof.
The continuous case is well-documented in the literature. See for example [24]. For the discrete case, observe first that since a mass function is non-negative, and we have assumed contiguous support, the function must be single-peaked, i.e. quasi-concave, as any local minimum would contradict the definition. Furthermore, the definition of log-concavity is equivalent to . Repeatedly applying this, and rearranging, we get
[TABLE]
It remains to show that is log-concave. We have
[TABLE]
as desired. The same technique applies for the upper-sum. ∎
This will allow us to then conclude:
(Lemma 2, p.2)
Let, denote the marginal r.v. conditioned on and . Let be a continuously-supported random variable, and . If is distributed like , , , or , then Assumption 1 is satisfied if is log-concave, or if the ’s are independent of .
Proof.
First, observe that
[TABLE]
and since we are taking ratios for fixed, we can replace the joint cumulatives on and in the assumption, with the marginals on just .
Now, if the ’s are independent of , then the ratio remains unchanged as changes, satisfying assumption 1. Otherwise, we begin by analyzing the distributions given by and . Let , noting that and , for the two cases, respectively. Note that we wish to show is increasing, which is equivalent to increasing.
For , observe that for and , we have
[TABLE]
since is a non-increasing and concave function, by assumption. Also
[TABLE]
where the first inequality is the same as the previous equation, as the second is by monotonicity. Thus we have done the continuous case.
For , we note that if . So the probability is . Similarly, for , is . Thus, if we assume that and are integers, the calculations above go through, as desired. ∎
We present a final fact that justifies the use of -type random variables:
Lemma 13**.**
If is a discrete log-concave random variable, then there exists a continuous log-concave such that .
Proof.
Let be the right-hand cumulative mass function for . Then, it suffices to have for all integers . Let be the piecewise-linear function such that , , and for all . Since is a discretely concave and non-increasing function, must be concave and non-increasing. We can then set to be the random variable whose density is given by . ∎
Appendix B Detailed Proofs
We present in this section the detailed proofs of the lemmas and theorems from the text. B.1 gives the pseudocode for the dynamic programs that compute the optimal pricing policies, outlined in Section 3, B.2 gives the proofs for the monotonicity of the pricing policies, along with the discussion on log-concave random variables from Appendix A, B.3 gives the concentration bounds from the last part of Section 3, and B.4 deals with the proof of Section 4 .
B.1 MDP Algorithms and Correctness
(Lemma 1, p.1)
For any fixed pricing policy ,
[TABLE]
*where the ’s are as in (2), and the ’s are from the modified MDP. *
Proof.
The statement is true for since in that case everything is zero. Suppose for all . For the fixed policy , we define . Then,
[TABLE]
∎
B.2 Monotonicity of Prices
These proofs are given in parallel with the discussion in Appendix A.
(Lemma 3, p.3)
*Let be the expected future revenue earned starting at time in state , for the optimal policy computed by Algorithm 2. Then the function is monotone non-increasing in for any fixed. *
Proof.
The proof is by induction on the time, decreasing. At time , there is no future revenue and , so the inductive claim follows trivially. Suppose, now, that the inductive claim holds at time . It suffices to show that this holds for each , since is simply their expectation. Let be the optimal pricing policy computed for the time by the Algorithm 2. Since the function , for any event , is left-continuous in the variable , we may define, for every and
[TABLE]
We must have , as is in the set. Now, letting , we have
[TABLE]
where . The first inequality holds by the induction hypothesis, the second is by definition of , the third by the definition of , and in the last, from the fact that is a (possibly) suboptimal pricing policy for the state at time . Note that this last inequality requires that the 0 value be feasible in the max, which it is, by setting arbitrarily large. ∎
(Lemma 4, p.4)
If the distribution on job parameters satisfies assumption 1, then for all , we have .
Proof.
Let , fix , , and , and let be equal to the optimal price . Observe that maximizes the expression
[TABLE]
For simplicity, let , and so for any ,
[TABLE]
Note that, as discussed in the proof of the previous lemma, , as otherwise it would be beneficial to set . The above inequality is then equivalent to
[TABLE]
We wish to show that, if , then as increases, the above inequality still holds. This would imply that the price gives better return than for jobs of length , implying that the optimal price must be at least , which is our desired goal.
Now, by assumption 1, the left-hand-side is non-decreasing in , so it remains to show that the right-hand-side is non-increasing in . The only changing term is , which by Lemma 3, is non-increasing in . Since it is in the denominator of a subtracted, non-negative term, we have our desired result. ∎
B.3 Concentration Bounds on Revenue for Online Scheduling
(Theorem 6, p. 6)
Let be a finite sequence of jobs sampled from an underlying distribution , and let be any monotone policy. Then, with probability ,
[TABLE]
*in the finite horizon, and in the infinite-horizon-discounted,
[TABLE]
In particular these results hold true for the (approximately) optimal pricing strategy computed in the previous part of the section.
Proof.
For the finite horizon, we apply Azuma’s inequality to the martingale . We being by showing the bounded-differences property. Note that we do not require truthful behaviour from the jobs, since taking strategic behaviour into account for a non-monotone policy is equivalent to modifying the distribution over the jobs, and making the distribution state-dependent, by increasing the length of those jobs who would rather buy a longer interval. Thus,
[TABLE]
where the last inequality follows from properties of conditional expectation. With this property, we can apply Azuma’s, and get
[TABLE]
For the infinite-horizon-discounted, we observe that equation (7) becomes
[TABLE]
and thus we get that . Therefore with probability ,
[TABLE]
Thus, taking the limit as , we get that with probability ,
[TABLE]
∎
B.4 Robustness of the pricing strategy
(Lemma 7, p. 7)
Let , and such that . In the finite horizon, for all .
Proof.
Let be the policy computed by Algorithm 2 with access to . As in Section 3, we denote , and . In an abuse of notation, denote and the estimated values of and , respectively. We cannot estimate directly with good error bounds, but we will only need the values and . Now, substituting these estimates into (9), we get:
[TABLE]
To simplify this expression, we begin by showing a simple claim: Let , , , , and let , such that , , and . Then
[TABLE]
Now, replacing and with \big{(}\pi^{*}_{t}(s,\ell)+{U^{*}_{t+1}(s+\ell-1)}\big{)} and , respectively, and replacing with , we have
[TABLE]
However, the argument of the supremum in left-hand term in the summand must be at most , since if , it is best to , which makes , putting all the weight on . Furthermore, we have shown in Lemma 3 that . Thus, we get
[TABLE]
Inductively applying this gives as desired. ∎
Appendix C Extensions
In this section, we extend the finite-horizon results to compute the optimal policies in the infinite-horizon-discounted setting, and also to argue that the optimal policy may be computed within some error when the distribution over values is continuous, rather than discrete.
These results are needed to show the full statements of Theorems 5–10.
C.1 Infinite Discounted Horizon
Recall, in this infinite horizon discounted setting, we seek to maximize the -discounted future revenue,
[TABLE]
over the choice of . Algorithm 2 does not allow us to immediately compute a solution for the infinite discounted horizon case. However we can exploit the discounting factor on the revenues to obtain an approximation of the infinite optimum: it suffices to consider the truncated problem up to a certain sufficiently large and solve it optimally using the algorithm presented above. In fact we have the following Lemma.
Lemma 14**.**
For any and , let be the pricing policy computed by the finite-horizon algorithm up to time . Let be the time-independent pricing policy such that . Then the expected performance of the optimal policy in the infinite horizon is within an additive of expected performance of .
Proof.
Note that in order to compute policy it is necessary to add the discount factor to Algorithm 2, and to all of the proofs of previous sections. One can verify that all proofs go through. Let be the Bayes-optimal infinite-horizon strategy — which is known to be time-independent — and let be as in the statement (where we set for all .) Then, in expectation over times [math] through , pricing as yields greater revenue than following . Conversely, in expectation over all time, pricing as yields greater revenue than . However, after time , the maximum possible revenue due to any policy is
[TABLE]
And so the difference in revenue due to following or is at most , since is sufficiently large.
It remains to show that performs better than overall. Let be the policy which agrees with for all , then equals for . Observe that, is optimal in expectation over the interval , and is equivalent to for the first step. Therefore, performs better than . Similarly, we can argue performs better than over the interval and equally before, hence performs better overall.
Thus, we have a sequence of policies converging to , and whose expected revenue is monotone non-decreasing along the sequence. Therefore, the expected revenue due to is greater than that of , which is an additive-approximation to the optimal policy. ∎
The approach above is analogous to the classical value iteration technique [23].
(Lemma 7, p. 7)
Let , and such that . In the infinite horizon, for all .
Proof.
As in the proof of Lemma 14, if is sufficiently large, we may analyze the first time steps as a finite-horizon problem, and the remaining revenue will be negligibly small. Now, the calculation above can be reproduced with discount terms, to show
[TABLE]
Then, inductively applying this and taking , we have . ∎
These results are used to prove the infinite-horizon versions of the various results throughout the paper, specifically the Theorems 5–6, and 10.
C.2 Approximation Algorithm for Continuously Supported Values
Note that the algorithms above assume that the value of the jobs () is discretely supported, and the running time depends on . In this section, we analyze the error incurred by discretizing the space of possible values, and then computing the optimal policy.
Let be some desired small grid size, and suppose we only allow ourselves to set prices which are multiples of . We claim that this incurs a small loss on the total revenue.
Define, as in the previous subsections, . Further, define as previously , and
[TABLE]
Define and similarly, restricting the maximum to choosing from multiples of .
Lemma 15**.**
Let and be defined as above, then .
Proof.
We will show this by induction on the value of , decreasing. Assume that for all , , and set . We wish to inductively bound the value of . Now,
[TABLE]
Now, let be the optimizer of this right hand side over (where the value would attain ), and be rounded down to the nearest multiple of . Then, since is non-increasing,
[TABLE]
Thus combining both equations, we get
[TABLE]
From which we conclude, by averaging over , that , as desired. ∎
Corollary 16**.**
Let and be defined as above, but for the infinite horizon discounted, then .
Proof.
As shown in the previous subsection, it suffices to perform the analysis in the finite horizon, while taking the discount factor into account, then take the limit as . The same calculations as above gives
[TABLE]
Summing the ’s and taking , we get as desired. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Roger B. Myerson. Optimal auction design. Math’s. of O.R. , 6(1):58–73, 1981.
- 2[2] William Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of Finance , 16(1):8–37, 1961.
- 3[3] Noam Nisan and Amir Ronen. Algorithmic mechanism design (extended abstract). In Proceedings of the 31st STOC , pages 129–140. ACM, 1999.
- 4[4] Arnoud V den Boer. Dynamic pricing and learning: historical origins, current research, and new directions. Surveys in O.R. and management science , 20(1):1–18, 2015.
- 5[5] Richard Cole and Tim Roughgarden. The sample complexity of revenue maximization. In Proceedings of the 46th STOC , pages 243–252. ACM, 2014.
- 6[6] Xiaoyong Tang, Xiaochun Li, and Zhuojun Fu. Budget-constraint stochastic task sched. on heterogeneous cloud systems. Concurrency and Comp.: Practice and Experience , 29(19), 2017.
- 7[7] Moshe Babaioff, Yishay Mansour, Noam Nisan, Gali Noti, Carlo Curino, Nar Ganapathy, Ishai Menache, Omer Reingold, Moshe Tennenholtz, and Erez Timnat. Era: A framework for economic resource allocation for the cloud. In Proceedings of the 26th WWW Companion , pages 635–642, 2017.
- 8[8] Shuchi Chawla, Nikhil R Devanur, Alexander E Holroyd, Anna R Karlin, James B Martin, and Balasubramanian Sivan. Stability of service under time-of-use pricing. In Procs. of the 49th Annual ACM SIGACT Symp. on Theory of Computing , pages 184–197. ACM, 2017.
