New Guarantees for Learning Revenue Maximizing Menus of Lotteries and Two-Part Tariffs
Maria-Florina Balcan, Hedyeh Beyhaghi

TL;DR
This paper develops new online learning algorithms with strong regret guarantees for revenue-maximizing menus of lotteries and two-part tariffs, addressing challenges posed by infinite parameter spaces and discontinuous revenue functions.
Contribution
It introduces the first online learning algorithms with regret bounds for these mechanisms, including reductions to finite experts and online linear optimization, and improves running times in distributional settings.
Findings
First online algorithms with regret guarantees for these mechanisms.
Reductions to finite experts and online linear optimization.
Improved running times over prior work.
Abstract
We advance a recently flourishing line of work at the intersection of learning theory and computational economics by studying the learnability of two classes of mechanisms prominent in economics, namely menus of lotteries and two-part tariffs. The former is a family of randomized mechanisms designed for selling multiple items, known to achieve revenue beyond deterministic mechanisms, while the latter is designed for selling multiple units (copies) of a single item with applications in real-world scenarios such as car or bike-sharing services. We focus on learning high-revenue mechanisms of this form from buyer valuation data in both distributional settings, where we have access to buyers' valuation samples up-front, and the more challenging and less-studied online settings, where buyers arrive one-at-a-time and no distributional assumption is made about their values. We provide a suite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
Learning Revenue Maximizing Menus of Lotteries and Two-Part Tariffs
Maria-Florina Balcan
Carnegie Mellon University
Hedyeh Beyhaghi
Carnegie Mellon University
Abstract
We advance a recently flourishing line of work at the intersection of learning theory and computational economics by studying the learnability of two classes of mechanisms prominent in economics, namely menus of lotteries and two-part tariffs. The former is a family of randomized mechanisms designed for selling multiple items, known to achieve revenue beyond deterministic mechanisms, while the latter is designed for selling multiple units (copies) of a single item with applications in real-world scenarios such as car or bike-sharing services. We focus on learning high-revenue mechanisms of this form from buyer valuation data in both distributional settings, where we have access to buyers’ valuation samples up-front, and the more challenging and less-studied online settings, where buyers arrive one-at-a-time and no distributional assumption is made about their values.
Our main contribution is proposing the first online learning algorithms for menus of lotteries and two-part tariffs with strong regret-bound guarantees. In the general case, we provide a reduction to a finite number of experts, and in the limited buyer type case, we show a reduction to online linear optimization, which allows us to obtain no regret guarantees by presenting buyers with menus that correspond to a barycentric spanner. In addition, we provide algorithms with improved running times over prior work for the distributional settings. The key difficulty when deriving learning algorithms for these settings is that the relevant revenue functions have sharp transition boundaries. In stark contrast with the recent literature on learning such unstructured functions, we show that simple discretization-based techniques are sufficient for learning in these settings.
1 Introduction
In recent years, a growing body of work has emerged in the field of machine learning for pricing and mechanism design problems. These problems involve selling items to buyers with the objective of maximizing revenue. The majority of the existing research has primarily concentrated on distributional settings, i.e., when the buyers’ values for the items are drawn from an unknown distribution. Less attention has been paid to the more challenging case of online setting, where buyers arrive one-by-one and no distributional assumption about buyers’ values is considered. In this case, the previous literature has mostly focused on simple mechanisms such as posted pricing or, more generally, mechanisms that sell the items separately (Blum et al., 2004; Kleinberg and Leighton, 2003; Blum and Hartline, 2005; Balcan and Blum, 2006; Bubeck et al., 2017; Cesa-Bianchi et al., 2014; Balcan et al., 2018b, 2020a). We advance this line of work by studying the learnability of two prominent classes of mechanisms, both represented as menus providing the buyers a list of allocation and payment options to choose from, namely menus of two-part tariffs and lotteries. These mechanisms go beyond selling the items separately, resulting in potentially higher revenue guarantees with applications to modern real-world scenarios. We provide the first online learning guarantees for these scenarios and improved guarantees for distributional learning. In the process, we discover the power of data-independent discretization for data-driven mechanism design and algorithm design more generally.
The first class we study is menus of two-part tariffs (Lewis, 1941), used for selling multiple units (i.e., copies) of a single item. In this family of mechanisms, the buyer is presented with a list (menu) of two-part tariffs, where tariff is a pair consisting of an up-front fee, , and a per-unit fee, . If the buyer wishes to buy units of tariff , she pays in total , and if she does not want to buy anything, she does not pay anything. The buyer has the freedom to select any of the tariffs. In particular, the cost for purchasing units is the minimum cost among all the tariffs, i.e., . Various products in the real world are sold via menus of two-part tariffs; for example, car or bike-sharing services and delivery service subscriptions.
The second class we study is the menus of lotteries for selling multiple items. In this context, the buyer is presented with a list (menu) of lotteries, where lottery is defined as a pair consisting of a vector of probabilities for allocating each item, , and a price, . If the buyer wishes to choose lottery , she receives each item with probability and pays . Menus of lotteries are a crucial family of mechanisms because (1) this family captures all possible mechanisms, including the optimal one (Dasgupta et al., 1979; Guesnerie and Oddou, 1981), and (2) menus of lotteries achieve revenue beyond other well-studied families of mechanisms such as posted pricing and, more generally, any deterministic mechanism (Briest et al., 2010; Hart and Nisan, 2019).
We study menus of two-part tariffs and lotteries in the context of parameter optimization, where the objective function (revenue) depends on parameter vectors. In menus of two-part tariffs, the parameters determining the mechanisms are the up-front fees and per-unit fees for each tariff, while for menus of lotteries, the allocation probability vectors and the prices for the lotteries determine the mechanism. In the parameter space, each point corresponds to a mechanism. A common approach in learning algorithms involves considering the objective function for a fixed buyer’s valuation (Balcan et al., 2017, 2018c, 2018b). In our context, the mechanism designer faces a utility-maximizing buyer, who, given the parameters determining the menu, chooses the entry, i.e., a lottery or a two-part tariff, in the menu that maximizes her utility. Therefore, the revenue function at any parameter vector is equal to the payment corresponding to the entry selected by the buyer.
1.1 Our Contributions
We study the learnability of menus of two-part tariffs and lotteries in both online and distributional settings. We advance the state-of-the-art in several aspects.
Technical challenges.
Discretization is a natural technique in data-driven algorithm design. In this approach, a finite set of parameter vectors, each representing a menu in the parameter space, are selected, and the algorithms optimize over that set. The smaller the set, the better the generalization guarantees will be in the distributional setting, and the better the regret guarantees will be in the online setting, with respect to the best menu in the set. In our setting, a proper data-independent discretization scheme would guarantee that independent of the buyer’s valuation, this set always contains a nearly optimal menu. More specifically, for any arbitrary parameter vector representing a menu, a menu in the set should generate almost as much revenue, independent of the buyer’s valuation. However, due to sharp discontinuities of revenue in the parameter space, devising such a discretization can be challenging. For instance, consider a menu with two high-utility entries for a buyer such that these entries have similar utility for the buyer but very different prices (e.g., one with high allocation and high price, the other with low allocation and low price). Minor changes in the parameters of these entries; e.g., rounding the parameters down to multiples of , may alter their utility order, causing the buyer to switch between them, resulting in arbitrary loss in revenue.
Structural Properties and a Revenue Preserving Cover.
By extracting structural properties for menus of two-part tariffs, we develop a novel discretization method that identifies a finite set of menus that approximate the revenue of any arbitrary limited-length menu (Theorem 1). In menus of lotteries, we extend the discretization of menus of lotteries developed by (Dughmi et al., 2014) (Theorem 16). Our extension is three-fold: we remove the lower bound assumption on value distribution, support additive valuations, and provide improved regret bounds and running times when the size of the menu is limited. In both settings (two-part tariffs lotteries), our discretization is data-independent; e.g., the set of discretized menus consists of all menus with parameters that are multiples of or powers of . The novelty of the result, however, lies in the analysis, which illustrates despite the challenges discussed above, for each arbitrary menu and valuation, this set contains a corresponding approximately revenue-preserving menu. For finding corresponding menus, rounding is nontrivial in the sense that entries with higher prices need to experience a larger decrease in price and a smaller decrease in allocation so that no buyer switches from a high-price to a low-price entry.
Online Learning (adversarial inputs and smooth distributional assumptions).
For menus of two-part tariffs, we provide the first no-regret online learning algorithms under adversarial inputs and also smooth distributional assumptions. For the full information setting, both settings lead to similar regret terms; however, the comparison of their running time depends on the support of the distribution and the maximum number of units available (Theorems 2 and 5). In the bandit setting, again, the regret of both settings are similar. However, the comparison between the efficiencies of the algorithms depends on the smoothness factor of the distributions (Theorems 3 and 6). Furthermore, we provide the first no-regret algorithm for a semibandit-setting (Theorem 7) with a polynomial running time in the number of discontinuities in the parameter space. This setting lies between the full-information and bandit settings, and the learner observes the revenue function for a set of menus containing the menu used. For menus of lotteries, we provide the first no-regret online learning algorithms under adversarial inputs (Theorems 18, 17 and 19). In addition, we provide evidence that menus of lotteries may not satisfy dispersion—a sufficient condition to provide a no-regret algorithm under smooth distributional assumption—without assuming extra structures about the optimal solution (Theorem 58). Menus of lotteries are the first family of mechanisms for which there is evidence of a potential failure of the dispersion property.
Distributional Learning.
We also provide novel distributional learning algorithms for menus of two-part tariffs and lotteries. Our algorithms choose several menus in a data-independent way (via data-independent discretization) and then select the best of them based on the data. In the context of two-part tariffs, our algorithm is much simpler than prior ones for the same problem, yet it enjoys improved worst-case runtime guarantees compared to them (Balcan et al., 2018c, 2020b) when the length of the menu is more than one (Theorem 15). We note that for other data-driven algorithm design problems, such as data-driven clustering and data-driven learning to branch, it was proven that algorithms that use data-independent discretization could perform very poorly (Balcan et al., 2017, 2018a). Thus, by contrast, our work shows the power of data-independent discretization for data-driven mechanism design and algorithm design more generally. In the context of lotteries, compared to the previous distributional learning results for fixed-length menus (Balcan et al., 2018c), our algorithm requires similar sample complexity; however, it has an efficient implementation (Theorems 22 and 56).
Limited Buyer Types.
For limited buyer types, we provide improved regret bounds for both the full-information and partial-information (bandit) settings for both menus of two-part tariffs and lotteries (Theorems 13, 14, 20 and 21). The high-level idea is as follows. Consider the revenue function in the parameter space for a fixed buyer. The parameter space is partitioned into regions where, within each region, the buyer selects the same option in the menu, e.g., the same lottery, resulting in a continuous revenue function. Discontinuity occurs across regions. For limited-type buyers, by superimposing the revenue functions for all types, the parameter space divides into more (albeit still a limited number of) regions. Regardless of the buyer type at hand, the revenue function is continuous within each region, and in our case, linear. Therefore, it is sufficient to only consider the corner points as potential parameter vectors that maximize the revenue. We show that in the full information case, running the weighted majority algorithm on the set of menus corresponding to the regions’ corner points results in sublinear regret.
In the partial information setting, in each round, we only observe the revenue of the current menu. To estimate the revenue from all the menus efficiently, or in other words, to find an unbiased estimator with a bounded range, we employ the notion of barycentric spanners in online learning introduced by Awerbuch and Kleinberg (2008). By utilizing this concept, we provide algorithms with a regret bound that is sublinear in the number of timesteps and polynomial in other parameters. This is the first time that barycentric spanner notion has been applied to an auction design setting. Similar contributions have been made in security games by (Balcan et al., 2015).
1.2 Related Work
Studying learnability of classes of mechanisms for the revenue maximization objective has been of great interest in recent years. These mechanisms have been studied mostly in a distributional setting, where buyers’ values are drawn from an unknown distribution, and the online setting, where there is no distributional assumption on the buyers’ values, has been explored less.***Some online learning algorithms, including those proved via the dispersion method, explained later, still make distributional assumptions; however, unlike the distributional learning setting, the draws are not necessarily from identical distributions. In the distributional setting, various mechanism classes, including posted-price mechanisms, second-price auctions with reserves, menus of two-part tariffs, and menus of lotteries, are known to be learnable (Morgenstern and Roughgarden, 2015, 2016; Balcan et al., 2016, 2018c, 2021a; Dughmi et al., 2014; Gonczarowski and Weinberg, 2021; Mohri and Medina, 2016; Syrgkanis, 2017; Dütting et al., 2019). In the online setting, under adversarial input (Blum et al., 2004; Kleinberg and Leighton, 2003; Blum and Hartline, 2005; Balcan and Blum, 2006; Roughgarden and Wang, 2016; Bubeck et al., 2017), and also under stochastic input (Cesa-Bianchi et al., 2014; Balcan et al., 2018b, 2020a) mostly simple mechanisms such as posted pricing and second-price auction are considered where both mechanisms sell the items separately. An exception is Roughgarden and Wang (2016) who study Vickrey-Clarke-Groves (VCG) mechanism with multiple reserves; however, the algorithms provided are not no-regret in the classic sense but are bounded-regret compared to a constant approximation of the optimal solution.
Two of the prominent approaches used for developing distributional results are pseudo-dimension-based and discretization-based. In the first approach, despite the discontinuity present in the utility of buyers as a function of the parameters used in the mechanism, it is shown that the pseudo-dimension of the family is bounded by using smoothness assumptions on the distribution. This approach applies to all the mechanisms mentioned above. In the discretization approach, a finite set of parameters are identified such that limiting the search space to this set is approximately optimal. This approach has been used for a limited number of mechanisms, such as item-pricing for combinatorial auctions for unrestricted supply Balcan et al. (2008) and menus of lotteries in a limited setting (Dughmi et al., 2014). In the online setting, Balcan et al. (2018b) and Balcan et al. (2020a) introduce dispersion as a sufficient condition for online learnability of families of mechanisms. They show several classes of mechanisms, such as posted-price mechanisms and second-price auctions with reserves, satisfy dispersion and, therefore, establish strong regret bounds for online learning. Discretization-based techniques in online learning scenarios have been used for the simple cases of item-pricing (Blum et al., 2004) and the second-price auctions (Cesa-Bianchi et al., 2014).
Two-Part Tariffs.
Two-part tariff pricing schemes were first introduced by Lewis (1941) and later analyzed by Oi (1971). Menus of two-part tariffs have been studied recently in the context of distributional learning (Balcan et al., 2018c, 2020b, 2022a). A recent work (Balcan et al., 2022a) provides improved running time bounds over (Balcan et al., 2020b) for distributional learning of two-part tariffs in the case where the number of pieces with continuous sum of utility functions across all problem instances is small (as defined in Section 3.2.2 the utility function measures the performance of our two-part tariff mechanisms on a fixed problem instance as a function of its parameters). However, for the case where the menu length is strictly greater than 1, (Balcan et al., 2022a) approach does not lead to improved running time over (Balcan et al., 2020b) for worst-case instances. So for worst-case instances and menu-length , our approach for distributional learning improves over previously best known results.
Menus of Lotteries.
Menus of lotteries capture all possible mechanisms, including the optimal one, for selling items to buyers. The Taxation Principle (Dasgupta et al., 1979; Guesnerie and Oddou, 1981) asserts that any mechanism for a single buyer can be represented as a menu of lotteries, where the buyer selects their favorite lottery (that is, the one that maximizes the buyer’s expected value for the randomized allocation minus the price paid). Furthermore, menus of lotteries achieve revenue beyond other well-studied families of mechanisms such as posted pricing and, more generally, any deterministic mechanism. For a correlated buyer (a buyer whose values for items are correlated), even in the simple cases where the buyer is additive (their value for a bundle of items is the sum of the value for individualized items) or unit-demand (their value for a bundle of items is the maximum value for an item in the bundle), the gap between optimal randomized mechanism (lotteries) and item-pricing is infinite (Briest et al., 2010; Hart and Nisan, 2019). Daskalakis et al. (2014) show that even for an independent additive buyer (the values for the items are independent), lotteries (randomized mechanisms) are necessary and provide strictly more revenue compared to any deterministic mechanism, including pricing mechanisms.
Failure of data-independent discretization-based learning.
Discretization is a natural approach for designing algorithms to tune parameters (e.g., prices for menus of two-part tariffs and allocation probabilities and prices for menus of lotteries) and is commonly used in applied fields such as applied machine learning. However, recent work has shown that in tuning parameters of algorithms for solving discrete combinatorial problems, discretization in the context of data-driven algorithm design does not always work if discretization is done in a data-independent way. For the case of tuning parameters for linkage-based algorithms, Balcan et al. (2017) showed that for several natural parameterized families of clustering procedures, for any data-independent discretization, there exists an infinite family of clustering instances such that any of the discrete parameters will output a clustering that is an factor worse than the optimal parameter, where is the input size. Here, the quality of clustering can be defined according to several well-known objectives, including -median, -means, and -center. Balcan et al. (2018a) show that for the data-driven problem of learning to branch for solving mixed integer linear programs (MILPs), data-independent discretization will not work either. More specifically, for any discretization of the parameter space , there exists an infinite family of distributions over MILP problem instances such that for any parameter in the discretization, the expected tree size is exponential in the input parameter. Yet, there exists an infinite number of parameters such that the tree size is just a constant (with probability 1). Remarkably, we show that in our context, even data-independent discretization works.
Dispersion and Online Data-Driven Algorithm Design.
Dispersion is a recently-developed notion for families of algorithmic and mechanism design problems and serves as a sufficient condition for the existence of bounded-regret online learning algorithms (Balcan et al., 2018b, 2020a) and differentially private distributional learning algorithms (Balcan et al., 2018b). Generally speaking, this condition bounds the concentration of discontinuities of the objective function in any small regions in the parameter space. Dispersion-based techniques have been established successfully for a variety of algorithms (Balcan and Sharma, 2021; Balcan et al., 2021b, 2022b), among which is tuning parameters in combinatorial problems, such as clustering problems discussed above (Balcan et al., 2018b). For menus of two-part tariffs, we show dispersion condition is satisfied, immediately implying no-regret online learning algorithms and differentially-private algorithms for distributional learning. Surprisingly, we present evidence that dispersion might not apply to menus of lotteries. In particular, we show in menus of lotteries the objective function might have sharp discontinuities concentrated in a small region. This structural property is in stark contrast with menus of two-part tariffs and other mechanism and algorithm families satisfying dispersion. Despite this evidence, we show that a simple discretization-based approach leads to no-regret online learning algorithms for menus of lotteries.
Sample Complexity for Menus of Lotteries.
The sample complexity for menus of lotteries has been studied under two different assumptions: independence of valuation across items, as studied by Gonczarowski and Weinberg (2021) and correlated valuation across items, as studied by Dughmi et al. (2014). By assuming independence simultaneously among the buyers and the items, a significant improvement over the sample complexity is possible (Gonczarowski and Weinberg, 2021). However, when the value for the items are possibly correlated, Dughmi et al. show a lower bound on the sample complexity verifying an exponential gap on the dependence in the number of buyers compared to Gonczarowski and Weinberg. Similar to Dughmi et al. and in contrast with Gonczarowski and Weinberg, we do not assume independence across items and only assume independence among the buyers.
2 Model and Preliminaries
We consider selling items to a single buyer for the revenue objective through parameterized families of mechanisms. In this paper the family of mechanisms is either the set of menus of two-part tariffs or lotteries. To put our notations in context, in this section we focus on menus of two-part tariffs as our running example.
Menus of two-part tariffs are used for selling multiple units (i.e., copies) of a single item through a list of up-front and per-unit fee pairs that the buyer can choose from. Menu M=\Bigl{\{}\left(p_{1}^{(1)},p_{2}^{(1)}\right),\dots,\left(p_{1}^{(\ell)},p_{2}^{(\ell)}\right)\Bigr{\}}\subseteq\mathbb{R}^{2\ell}, is a length- menu of two-part tariffs. Each menu is parameterized by which in this case is -dimensional and contains all and where all . and are called the up-front fee (price) and per-unit fee (price) of tariff , respectively. We denote a buyer’s valuations for all units by , where the values are nonnegative, monotonically increasing, belong to , and . Under the tariff denoted by and the number of units that the buyer selects, she receives units of the item and pays . The buyer’s utility is her value for the number of units bought less the payment. Each buyer has the option of buying their utility-maximizing tariff and number of units. In other words, the buyer will buy units using tariff that maximizes or does not buy and does not pay anything.
Let be an infinite set of mechanisms parameterized by a set . In this paper, is either the set of two-part tariff menus or lottery menus. Consider the case where is the set of two-part tariff menus for selling multiple units of a single item to a buyer with value while the menu corresponds to parameter . Next, let be a set of problem instances for , such as a set of buyer valuations , and let be a utility function where measures the performance of the mechanism with parameters on problem instance . In our case, is the revenue of the mechanism (a menu of two-part tariffs or lotteries) with parameters on input . For example, for the menus of two-part tariffs, is the set of possible menus and since each menu is -dimensional with each dimension in , . is the set of buyer valuations and be a utility function where measures the revenue of the menu with parameters on buyer valuations .
Online Setting
In this setting, a sequence of functions arrive one by one. Unlike , only takes parameter as the input and is defined as , where is the problem instance at timestep . At the time , the no-regret learning algorithm chooses a parameter vector and then either observes the function in the full information setting, the scalar in the bandit setting, or for a set of in the semibandit setting. The goal is to minimize the expected regret, . We study the online setting both under adversarial input, where are selected adversarially, and under smoothed distribution inputs which assume more structure. The expectation in the regret formula is taken over the randomness of the algorithm in the adversarial setting and over the randomness of the algorithm and distribution of buyers in the smoothed distributional setting.
Distributional Setting
In the distributional setting, the algorithm receives samples from an unknown distribution over problem instances . The goal is to find a parameter vector that nearly maximizes the expected utility, i.e., similar to statistical learning theory (Vapnik, 1998) or PAC learning Valiant (1984).
3 Menus of Two-Part Tariffs
In this section, we consider M=\Bigl{\{}\left(p_{1}^{(1)},p_{2}^{(1)}\right),\dots,\left(p_{1}^{(\ell)},p_{2}^{(\ell)}\right)\Bigr{\}}\subseteq\mathbb{R}^{2\ell} as a length- menu of two-part tariffs. See Section 2 for a detailed description.
3.1 Discretization Procedure
This section shows a discretization procedure for the menus of two-part tariffs. Given any menu and value , we provide an alternate menu such that all the price elements, and for all , are multiples of and the alternate menu provides nearly as much revenue as the given menu up to a term that depends on . The main result of this section is summarized in the following statement.
Theorem 1**.**
Given a menu of two-part tariffs and parameter , Algorithm 4 outputs menu whose revenue is at least the revenue of less , for any buyer’s valuation. Furthermore, for all , all and are multiples of . The set of potential outcomes constitutes a space with at most menus, where is the maximum value for any number of units.
Proof idea of Theorem 1 and intuition behind Algorithm 1.
The main structural ideas deriving the algorithm and the proof of the revenue guarantee are as follows: (i) for a fixed number of units to be purchased, the utility-maximizing tariff is the same across all the buyer’s valuations; namely, the tariff that has the smallest overall price (upfront price plus times per-unit price), and (ii) as the number of units to be purchased increases, the per-unit price of the utility-maximizing tariff decreases. The main idea of the rounding algorithm is decreasing the corresponding prices of tariffs with lower per-unit fees by a larger amount (4). By doing so, for each buyer, the total price of buying more units decreases more than the total price of buying fewer units. This step ensures that the buyer does not switch from purchasing more units to fewer units after the rounding. This property is sufficient for the revenue guarantees. The other steps of the algorithm delete redundant tariffs (2 and 6) and ensure the final prices are multiples of (5). The theorem provides two upper bounds for the size of the discretized space. By 5, all the prices are multiples of . Therefore, the price components in a length- menu each have options, which gives the first bound. On the other hand, if we consider a single tariff, each of the up-front fee and the per-unit fee has possibilities, therefore, the total number of possible unique tariffs are . Each of these possible tariffs may or may not be on the menu, giving the second bound. The full proof is provided in Appendix A. ∎
3.2 Online Learning
We provide bounded-regret online learning algorithms in full and partial information settings. Sections 3.2.1, 3.2.3 and 3.2.2 provide online algorithms under adversarial input, under smooth distributions, and for limited type buyers, respectively.
3.2.1 Online Learning Under Adversarial Inputs
The main statements are Theorems 2 and 3 which provide regret guarantees for the full-information case and partial-information case, respectively. Using the discretization in Section 3.1, we show a reduction to finite number of experts and run standard learning algorithms (weighted majority and Exp3) over the menus in the discretized set. Similar ideas were used in previous papers, for example (Blum et al., 2004; Balcan et al., 2018b).
Full Information
In the full information setting, the seller sees the revenue generated for all the possible menus. To design an online algorithm in this case, we use a variant of the weighted majority algorithm by (Auer et al., 1995). The experts in our case are the discretized menus from the previous section, denoted in the algorithm by set . Furthermore is the valuation of the buyer are time and is the cumulative revenue of menu for the buyers until time step .
Theorem 2**.**
In the full information case for length- menus of two-part tariffs, running Algorithm 2 over discretized set of menus specified in Theorem 1 for has regret bounded by , and running time .
The proof follows by combining the guarantees of the discretization procedure (Theorem 1) and previously known results (specifically (Auer et al., 1995), Theorem 3.2) and is deferred to Appendix A.
Partial Information (Bandit Setting)
In the partial information setting, the seller does not see the outcome for all the possible menus and only observes the outcome of the menu used (the tariff and number of units chosen by the buyer). To design an online algorithm in this case, we use a version of the Exp3 algorithm in (Auer et al., 1995). This variant of the Exp3 algorithm contains the weighted majority algorithm (Algorithm 2) a subroutine. At each step, we mix the probability distribution , used by the weighted majority algorithm, with the uniform distribution to obtain a modified probability distribution , which is then used to select a menu from our discretized set. Following the tariff and the number of units chosen by buyer , we use the price paid (the gain from the chosen menu) to formulate a simulated gain vector, which is then used to update the weights maintained by the weighted majority algorithm.
Theorem 3**.**
In the partial information case for length- menus of two-part tariffs, running Algorithm 3 over discretized set of menus in Theorem 1 for , has regret bound , and running time .
The proof follows by combining the guarantees of the discretization procedure (Theorem 1) and previously known results (specifically (Auer et al., 1995), Theorem 4.1) and is deferred to Appendix A.
3.2.2 Online Learning Under Smooth Distributions
Recent papers studying online learning of mechanisms studied the problem in the restricted setting, where at each point in time, the buyers’ valuations come from -bounded distributions, where the density function is bounded at all points by . This assumption has proved to be sufficient for a few classes of mechanisms, including posted-pricing and second-price mechanisms, to establish dispersion. At a high-level, dispersion ensures that the number of discontinuities in a small ball in the parameter space is limited with high probability and is a sufficient condition for bounded-regret online algorithms. We prove that menus of two-part tariffs satisfy dispersion and use it to derive bounded-regret algorithms for full-information, bandit, and semi-bandit settings. The main difference between the algorithms used in this suction compared to the adversarial input setting in Section 3.2.1 is that we previously needed to go through a careful data-independent discretization step (Section 3.1) to reduce the problem to a finite number of experts. However, under smooth distributions, the assumed properties of the distribution influence the set of experts chosen.
We provide the main results in this setting, followed by a discussion of the key ideas behind the algorithms and proofs. After establishing the dispersion constraint for menus of two-part tariffs, it is sufficient to employ previously known algorithms designed for dispersed settings to achieve no-regret guarantees. The primary purpose of this section is to compare the regret guarantees from the recently developed online learning technique of dispersion and the discretization approach discussed in the previous section. The formal definition of dispersion and technical descriptions of the algorithms and proofs are deferred to the appendix. The main results are as follows†††The regret term in the semi-bandit algorithm (Theorem 7) is better than the full-information algorithm (Theorem 5) since different notions of dispersion are used. Also, the stated running time of both algorithms are the same; however, this is in the worst case, and the semi-bandit algorithm potentially performs fewer computations.:
Definition 4**.**
[-bounded] A density function corresponds to a -bounded distribution if .
Theorem 5**.**
Let be the revenue functions of two-part tariff menus such that denotes the revenue of a mechanism associated with menu parameters for the buyer arriving at time . Let the samples of buyers’ values be drawn from . Suppose for any number of units . Also, suppose that for each distribution , and every pair of number of units and , and have a -bounded joint distribution. An efficient implementation of the exponentially weighted forecaster with (Algorithm 5) has expected regret bounded by and runs in time .
Theorem 6**.**
Let be the revenue functions of two-part tariff menus such that denotes the revenue of a mechanism associated with menu parameters for the buyer arriving at time . Let the samples of buyers’ values be drawn from . Suppose for any number of units . Also, suppose that for each distribution , and every pair of number of units and , and have a -bounded joint distribution. There is a bandit-feedback online optimization algorithm with expected regret . The per-round running time is .
Theorem 7**.**
Suppose the buyers’ values are drawn from , where each is -bounded for . Then, running the continuous Exp3-SET algorithm (Algorithm 7) for menus of two-part tariffs under semi-bandit feedback has expected regret bounded by . An efficient implementation has the same regret bound and running time .
Partitioning of parameter space to convex regions with linear utilities (Balcan et al., 2018c)
Consider the sequence of buyers valuations . At each time step, a buyer is presented a menu, and based on the menu and their valuation, they select the tariff index and number of units that maximizes their utility. Formally, given menu , buyer with valuation selects option , where is the tariff index and is the number of units if this option produces more utility for the buyer than any other options. Concretely,
[TABLE]
where and are the up-front fee and per-unit fee of tariff in menu . The above inequalities identify a convex polytope of parameter vectors (menus ) with hyperplane boundaries. Since the tariff index and the number of units that selects are fixed in the region, the revenue, , is continuous and more specifically linear in the region (formally proved in Lemma 39). Following the same argument for the buyers in the sequence, the parameter space for each buyer is partitioned into convex polytopes where the revenue for the buyer’s valuation is linear inside the polytopes. By superimposing these partitionings, since the intersections of convex regions are also convex, and the sum of linear functions (here revenues) is linear, the parameter space, is partitioned into convex regions such that the cumulative revenue for the sequence is linear in each region. Inside each region, the utility-maximizing choice of each buyer is fixed; therefore, each region is associated with a mapping from buyer valuations to their corresponding utility-maximizing tariff index and number of units. We may use the mapping, formally defined in Section 3.2.3, to denote the region, e.g., region corresponding to mapping , or simply use cardinal indices for the regions .
Dispersion for menus of two-part tariffs
We provide intuition why menus of two-part tariffs for bounded density distributions satisfy dispersion; that is, the discontinuities in the revenue function do not concentrate with high probability. To prove this, we focus on Equation 1 for fixed values of , i.e., pairs of tariffs and units, and for all . The equalities for all of these equations are met at parallel hyperplanes because, for each and fixed pairs of tariffs and units, other parameters, i.e., are fixed, and the equations are only different in . Assuming independence of distributions among buyers and -bounded joint distributions over and , with high probability the intersection of multisets of parallel hyperplanes, defined by Equation 1 do not concentrate, implying dispersion. A concrete definition of dispersion and a formal dispersion proof are presented in the appendix.
Overview of Algorithms
We provide high-level ideas for the full-information, bandit-setting, and semibandit-setting algorithms used for Theorems 5, 6 and 7, respectively. Generic forms of these algorithms were devised by Balcan et al. (2018b, 2020a) for dispersed families of algorithms. The full information algorithm considers the cumulative revenue function up until the time over the parameter space and samples the menu to present at time proportional to an exponential function of its cumulative revenue. In order to have an efficient implementation, they use techniques from high-dimensional geometry and approximately sample menu . Let be the partition of until time . The algorithm picks with probability approximately proportional to the region’s cumulative weight and outputs a sample from the conditional distribution of menus in . The bandit-setting algorithm considers a grid over the parameter space, whose granularity depends on the dispersion parameters, and runs the Exp3 algorithm over menus corresponding to the grid. The semi-bandit-setting algorithm is a continuous version of the Exp3-SET algorithm of Alon et al. (2017). At each time step, the algorithm learns the revenue function (only) inside the region that the presented menu belongs to and updates the menu weights for the next round accordingly.
Comparison to the results in Section 3.2.1
Although the discretization-based algorithms work under adversarial inputs and are more general, they provide similar regret bounds and even improved running times in some cases. In the full information case, the dependence on the regret bound in parameter is similar in both algorithms. In running time, the discretization-based algorithm suffers worse dependence in , but enjoys better dependence in and (the maximum number of units) compared to the dispersion-based algorithm. In the bandit setting, similarly, the regret bounds are similar in their dependence on , while the running-time comparison depends on the value of (maximum density under smoothness assumption) such that lower-density distributions may result in better running times.
3.2.3 Limited Buyer Types
In this section, we assume that there are a finite known number of buyer types. This information provides extra structures compared to the general setting considered previously. In particular, now the mechanism designer is aware of where the potential discontinuities happen as a function of the parameter space. We provide algorithms with bounded regrets both for the full information and partial information settings specific to limited types. These algorithms improve the regret bounds significantly when the number of buyer types is small. This section is inspired by Balcan et al. (2015) and includes similar algorithms and notations.
Balcan et al. (2015) study a security games setting, in which at each time step, the defender has a mixed strategy (a probability distribution) for protecting the attack targets. Knowing this mixed strategy, the attacker selects a target to attack, which maximizes the attacker’s utility (depending on the attacker’s type). Considering the target selected by each attacker type as a function of the defender’s mixed strategy, the mixed strategy space is partitioned into regions where the action of each attacker type is fixed throughout each region. This is very similar to our setting, where the parameter space is partitioned into regions, where inside each region, each buyer type selects a fixed tariff index and the number of units (see the discussion on partitioning the parameter space in Section 3.2.2). Balcan et al. use the linear structure of utility inside each region to develop a no-regret full-information algorithm. In the partial information setting, other than the linearity of utility functions, they use the dependence of an agent’s (in their case, attacker, and in our case, buyer) actions across different regions and identify a limited number of mixed strategies (corresponding to menus in our case) such that observing the agent’s response to them suffice to estimate the utility of other strategies. We use similar machinery in both the full and partial information settings. However, the source of linearity of the utility is different across the two settings. In the security games context, the attackers’ actions are fixed inside regions, and the cumulative utility is a weighted sum of the utility of those actions where the weights are the parameter space coordinates. In our setting, we utilize the specific structure of menus and show the cumulative utility is a linear function of coordinates. For completeness and making the paper self-contained, we include a full description of the algorithms and techniques adapted to our setting and using our terminology.
In this setting, we utilize the knowledge of the potential buyer types to design a limited number of menus and optimize over this set. In contrast to the previous section, where the valuations were realized after the arrival of the buyers, here, we have access to all potential buyer types up-front, but similarly, as discussed in Section 3.2.2, the piecewise linear structure of the utility for the buyers partition the parameter space such that each part has linear cumulative utility (Balcan et al., 2018c). This partitioning is equivalent to dividing the parameter space into convex regions such that in each region, there is a fixed mapping from the buyer types to the menu options that each buyer selects. We show that in each region, we need to consider only a limited number of menus, namely the extreme points.
Consider as the set of all potential buyer valuations. denotes the number of buyer types. In order to define the behavior of buyers in each region, we need to define a concept called menu options, which determines the choices of the buyers.
Definition 8** (menu option for menus of two-part tariffs, ).**
A pair , where is the tariff index , and is the number of units is a menu option. We denote the set of all menu options as . This set identifies all potential actions of a buyer when presented with a menu.
Definition 9** (mapping , feasible mappings, ).**
A mapping is a function from buyer types, to menu options , where and are the tariff index and the number of units assigned to the buyer type respectively. Mapping is feasible if there is a menu corresponding to the mapping, i.e., a menu that if presented to the buyers, each buyer selects their corresponding option in the mapping as their utility maximizing option. denotes the region of the parameter space corresponding to , i.e., the set of menus inducing mapping .
Using the discussion in Section 3.2.2 and as formally defined in Lemmas 39 and 38, the parameter space is partitioned to convex polytopes, each with a linear utility function for any sequence of buyer types. Therefore, for optimization purposes, it seems enough to only consider menus corresponding to the extreme points. This intuition is accurate conditioned on a small tweak. Depending on the tie-breaking rule of buyers among menu options producing the same utility, the polytopes may not be closed. Therefore depending on the tie-breaking rule, we consider a menu in proximity to the extreme point but inside the polytope.
Definition 10** (, extended set of extreme points (Balcan et al., 2015)).**
For a given , set is the set of menus as follows: for any and any that is an extreme point of the closure of , if , then , otherwise, there exists such that and . From now on, we may refer to as the extreme points.
Lemma 11**.**
The number of extreme points, is at most .
Proof.
Length- menus of two-part tariffs occupy a -dimensional parameter space. In each -dimensional space, an extreme point is the intersection of linearly independent hyperplanes. The total number of hyperplanes defining the regions is , where for each buyer type compares the utility of any pair of options, i.e., number of units and tariff indices . Out of these hyperplanes, we need of them to intersect to for an extreme point. Therefore, the number of extreme points is at most , implying the statement. ∎
The following lemma bounds the loss in utility where the set of menus is limited to the extreme points . The proof is similar to Balcan et al. (2015), however, the loss depends on the problem-specific utility functions.
Lemma 12**.**
Let be as defined in Definition 10, then for any sequence of buyer valuations , and as the optimal menu in the hindsight:
[TABLE]
Proof.
The proof consists of a few simple steps: (i) since the mappings partition the space into regions with a fixed mapping, there exists a mapping such that , (ii) the revenue of the buyer valuation sequence is linear in as shown in Lemma 39, (iii) the closure of is a convex polytope whose extreme points contain the maximizers of the linear function , (iv) one of the maximizers has cumulative utility at least as (v) the parameter vectors in proximity of the extreme point inside approximately preserve the revenue of the extreme points (vi) since by definition of the distance of each member to an extreme point is at most , there is at most distance in the upfront fee and per-unit fee for any tariffs, resulting in the bound in the statement. ∎
Full Information
We first provide an algorithm for the full information case specific to the finite number of buyers. The main result of this section is provided below. The algorithm to achieve this regret guarantee is a weighted majority algorithm (Algorithm 2) on the set of menus corresponding to the extreme points .
Theorem 13**.**
In the full information case for length- menus of two-part tariffs, when there are types of buyers, running Algorithm 2 over the set of menus corresponding to set for has regret bounded by .
The proof follows from Lemma 12 and the guarantee of weighted majority algorithm and is deferred to the appendix.
Partial Information (bandit)
In the partial information setting, in each time step , we present the arriving buyer a menu and only observe the option selected by the buyer (e.g., the tariff and the number of units) in the presented menu. A natural approach in this setting is running the EXP3 algorithm and using the weighted majority algorithm for the full information case as a subroutine. However, this approach leads to a regret bound that is exponential in the size of the menu (this result is presented formally in Appendix A). An alternative to this approach is estimating the revenue of other menus, more technically finding an unbiased estimator with bounded range for the revenue of all the menus, and then running the full information algorithm with the estimates, as introduced by (Awerbuch and Mansour, 2003). We take the latter approach and find the estimates by employing the notion of barycentric spanners (Awerbuch and Kleinberg, 2008). A barycentric spanner is a basis in a vector space such that any vector can be represented as a linear combination of basis vectors with bounded coefficients. By utilizing this concept, we provide algorithms with a regret bound that is sublinear in the number of timesteps and polynomial in other parameters. Similar ideas were employed in (Balcan et al., 2015).
There are two main ideas deriving our bounded-regret algorithm. The first is a reduction from the partial information case to the full information case assuming Oracle access to proper estimates of utilities for all the menus, and the second is deriving these estimates. The first idea was introduced by Awerbuch and Mansour (2003), and we directly use an inspired theorem by Balcan et al. (2015) that suits our setting more accurately. For the second, we also use similar machinery to Balcan et al. (2015).
We first show how to estimate the utility of any menu by only using the response of the buyers to a limited number of menus. In doing so, we take advantage of the dependence between responses of the buyers for different menus to obtain estimates for unused menus. In order to estimate the expected revenue of each menu over a time interval, it is sufficient to estimate the probability of selection of each option in the menu (tariff index and number of units) by the buyers. Since the price of each option is determined by the menu, we can infer the expected revenue using these probabilities. Note that the option that each buyer type selects is fixed throughout each region. Balcan et al. (2015) use the dependence between these probabilities across regions to find a limited set of menus that infer the estimates. An analogous argument to theirs in our setting is as follows. Let be the set of length- indicator vectors that, for each region and each option , indicate the (maximal set of) buyer types that select the option given menus in . The algorithm presents the menus corresponding to the barycentric spanner of to buyers at random times and records whether the buyer selects the corresponding option. We show the utility of each menu can be represented as a linear function of its corresponding vectors in and, therefore, a linear function of the barycentric spanner vectors of . This is enough to derive the estimates.
Now, we describe the overall structure of the algorithm. The algorithm operates in time blocks, with each block consisting of exploitation and exploration time steps. The exploration time steps are selected uniformly at random within the block and are limited in number. In an exploitation step, the menu used is the output of the full information algorithm, employing unbiased estimators from the previous time block. These menus are always the extreme points . During exploration time steps, the menus corresponding to the barycentric spanner are used. At the end of each time block, the algorithm refines the unbiased estimators of all corner points using the information gathered in the exploration phases. A detailed description and proof of the theorem are provided in the appendix.
Theorem 14**.**
In the partial information (bandit) case for length- menus of two-part tariffs, when there are different types of buyers, there is an algorithm with regret bound of .
3.3 Distributional Learning for Two-Part Tariffs
We present distributional learning results for menus of two-part tariffs. The learning algorithm simply considers all menus in the discretized set specified by Theorem 1 and outputs the empirical revenue-maximizing menu given the samples. More specifically, for each menu in the discretized set, the algorithm computes the cumulative revenue achieved from the samples and outputs the menu with the maximum cumulative revenue. The revenue from each sample (buyer) for a fixed menu is the total payment corresponding to the buyer’s utility maximizing option (tariff index and the number of units). This approach has a major difference with the previous line of work, e.g., (Balcan et al., 2018c, 2020b, 2022a), that did not use a discretization and optimized over the infinite parameter space.
Theorem 15**.**
In the distributional setting, for length- menus of two-part tariffs, there exists a learning algorithm with sample complexity and running time
Remark.
For menus of length larger than one, i.e., , The running time from Theorem 15 is roughly the square root of the running time of the previous result (Balcan et al., 2020b, 2022a) in the worst case in terms of parameters , , and . Under extra structural assumptions, (Balcan et al., 2022a) may result in better running times. See Appendix B for more details.
4 Menus of Lotteries
Consider selling items to a buyer. A set , where and is a length- menu of lotteries. Each is a vector of length . Under the lottery , a buyer receives each item with probability and pays a price of . The buyer’s expected utility for the lottery is their expected value for the lottery less their payment. We consider additive and unit-demand buyers. For additive buyers, their value for lottery is , where is their value for item . The buyer’s expected utility is . Note that for additive buyers, due to linearity of expectation, it does not matter whether the allocation of the items in a lottery, are independent or correlated. For unit-demand buyers, without loss of generality, we only consider lotteries such that . Under this constraint, for each lottery , the allocation of the items are dependent, and the buyer never receives more than one item. In this case, the utility for lottery has the same expression as for additive buyers. Presented with a menu of lotteries, the buyer selects a utility-maximizing lottery and the mechanism achieves revenue .
Putting the problem formulation in the context of Section 2, is the set of all menus of lotteries, each parameterized by which in this case contains all and , where each and is for the additive setting (and for the unit-demand setting). is the set of buyer valuations and be a utility function where measures the revenue of the menu with parameters on buyer valuations .
4.1 Discretization procedure
In this section, we introduce a rounding procedure for menus of lotteries. In this procedure, given any vector of parameters (representing a menu) with arbitrary coordinates, we find a transformation to another vector that has two properties; first, the revenue of the output is nearly as high as the original menu for any valuation; secondly, the coordinates corresponding to allocation probabilities and prices belong to a finite set of values. This rounding procedure performed on all possible menus results in a final set of outcomes. We perform the learning algorithms over this finite set.
Theorem 16**.**
Given a menu of lotteries and parameters , , and , an arbitrary natural number, Algorithm 4 outputs menu such that . The set of possible allocation probabilities is , where and the set of possible prices is . This constitutes a space with at most discrete points, when limiting to length- menus and discrete points for arbitrary-length menus.
Overview of Algorithm 4. The algorithm consists of three main steps and its logic is similar to that of Dughmi et al. (2014). In step 1, we divide the lotteries in the menu exceeding a minimum price into levels based on their price (and remove the ones below the minimum). The division in prices is proportional to powers of with a higher level having a higher price, compared to a lower level . Step 2 rounds down the allocation probability coordinates to a finite set. By multiplying by and then rounding to integer powers of , the allocation probabilities of lower-price levels decrease by a larger factor, making lower-price levels less desirable. Step 3 rounds down the prices, first by multiplying all prices by the same factor, , then by rounding to multiples of and finally by subtracting , which results in more subtraction of price for originally higher-price entries. The main insight behind nearly preserving the revenue of the original menu (and circumventing the issue with simple rounding) is that prices of the more expensive lotteries (higher-price level) are decreased more than the lower-price ones, while their allocation decreases by a lower factor. This ensures that no buyer with any valuation, switches from a higher-price level to a lower-price, after the rounding.
4.2 Online Learning
We provide bounded-regret online learning algorithms in full and partial information settings for fixed and arbitrary-length menus of lotteries. The setting considered is as follows. In each round, a new buyer arrives, and a length- lottery menu is presented to the buyer. The buyer selects her utility-maximizing lottery and pays . The mechanism achieves revenue . Missing proofs and explicit description of the algorithms are deferred to Appendix B.
In the full information setting, the seller sees the revenue generated for all the possible menus. Similar to the previous section, we run Algorithm 2 (a weighted majority algorithm) over the discretized set as the outcome of Algorithm 4 and derive the following results for the length- menus and arbitrary length menus.
Theorem 17**.**
In the full information case for length- menus of lotteries, running Algorithm 2 over the discretized set of menus specified in Theorem 16 for , , , and has regret .
Theorem 18**.**
In the full information case for arbitrary length menus of lotteries, running Algorithm 2 on menus specified in Theorem 16 for , , , and has regret .
In the partial information setting, the seller only observes the revenue generated for the menu at hand. Similar to the previous section, we run Algorithm 3 (EXP3 algorithm) over the discretized set as the outcome of Algorithm 4 and derive the following result for length menus.
Theorem 19**.**
In the partial information case for length- menus of lotteries, running Algorithm 3 over discretized set of menus in Theorem 16 for , , , and has regret .
For the case with buyer types, we use similar machinery to Section 3.2.3 to derive bounded regret algorithms in the full and partial information settings. The discussion of how to adapt to the lotteries setting is deferred to the appendix. the partial information case.
Theorem 20**.**
In the full information case for length- menus of lotteries, when there are types of buyers, there is an algorithm with regret bound of .
Theorem 21**.**
In the partial information (bandit) case for length- menus of lotteries, when there are different types of buyers, there is an algorithm with regret bound of
Remark
The above results hold under adversarial input. Unlike menus of two-part tariffs (and many other families of algorithms and mechanisms discussed in Balcan et al. (2018b, 2020a)), for menus of lotteries, we provide evidence that dispersion, a sufficient condition for online learning under smooth distributions, may not hold. A formal result is stated as Theorem 58.
4.3 Distributional Learning
In the distributional setting, we have sample access to buyers’ valuations. The value of the buyer for item is drawn from distribution with support ; we do not assume independence among items. Similar to the distributional learning algorithm for menus of two-part tariffs, the algorithm simply considers all menus in the discretized set specified by Theorem 16 and outputs the empirical revenue-maximizing menu given the samples. The revenue from each sample (buyer) for a fixed menu is the payment corresponding to the buyer’s utility-maximizing lottery in the menu.
Theorem 22**.**
For length- menus of lotteries, there is a discretization-based distributional learning algorithm with sample complexity , and running time
Remark
For the limited menu length, the sample complexity of Theorem 22 is roughly the same as (Balcan et al., 2018c), but the advantage is that we provide an efficient algorithm when and are constant. The analysis for arbitrary-length menus is provided in the appendix as Theorem 56. The sample complexity and running time provided are similar to that of (Dughmi et al., 2014), however, Theorem 56 works for a more general setting.
5 Discussion
This paper contributes to both learning theory and mechanism design by studying prominent families of mechanisms from a learning perspective. Our work is focused on learning menu mechanisms that go beyond selling the items separately. Menus of lotteries provide a list of randomized allocations and their corresponding prices to the buyers and are specifically advantageous for selling multiple items. Menus of two-part tariffs, on the other hand, are employed for selling multiple units (copies) of an item by presenting a list of up-front fees and per-unit fees to the buyer.
Discretization versus Dispersion
The majority of the paper focuses on online learning of these families of mechanisms. Two of the commonly used techniques for this setting are (the more traditional) discretization-based and (the recently-developed) dispersion-based techniques. Menus of lotteries and two-part tariffs are examples of parametric algorithm or mechanism design, where the objective function, here revenue, has sharp discontinuities in the parameter space, and the standard procedures, such as rounding down the parameters to multiples of , may result in arbitrary revenue loss. A discretization scheme means that there exists a grid in the parameter space such that for any arbitrary parameter vector, there is a corresponding parameter vector in proximity over the grid generating similar revenue. However, finding the corresponding parameter vector (the direction to move from the original parameter vector in the space) needs taking extra care, and moving in arbitrary direction may cause a large revenue loss. In contrast to the discretization scheme, another method developed for proving online learnability of parameterized algorithms, called dispersion Balcan et al. (2018b, 2020a), asserts that under smoothness assumptions moving in a small ball of parameter vectors, does not face sharp discontinuities with high probability. This means that with high probability, moving in any direction preserves similar revenue. Nevertheless, we show evidence that the dispersion may not hold for menus of lotteries Theorem 58 and while dispersion holds for menus of two-part tariffs Propositions 33 and 36, it heavily uses the smoothness assumption. In conclusion, although a small but arbitrary modification may change the revenue drastically when starting from a parameter vector, in designing our discretization scheme, we show a specific direction such that small modification along that direction preserves the revenue. See Theorems 1 and 16.
Limitations
While we present strong regret-bound guarantees both in the general case and limited buyer types, our algorithms are not always computationally efficient. Designing corresponding computationally efficient algorithms is an open direction.
6 Acknowledgement
The authors would like to thank Avrim Blum, Misha Khodak, Rattana Pukdee and anonymous reviewers for helpful feedback and comments. This material is based on work supported in part by the National Science Foundation under grant CCF-1910321 and a Simons Investigator Award.
Appendix A Missing Proofs of Section 3
A.1 Discretization Procedure
Before providing the proof of the discretization procedure, we provide intuition why discretization is a nontrivial procedure for menus of two-part tariffs. For this family of mechanisms, standard procedures, such as rounding down the prices to multiples of , may result in arbitrary revenue loss because the price parameters of each tariff decrease by different amounts affecting unpredictable changes in utilities of selecting each tariff and number of units. It would be possible that the utility-maximizing choice for a buyer switches from a higher-price tariff and more units (that originally has slightly higher utility for the buyer) to a low-price tariff and fewer units (that originally has slightly lower utility for the buyer) after a simple rounding.
Now, we provide structural results that enable us to design a discretization procedure. Given a menu of two-part tariffs, the following definition deletes the dominated tariffs (independent of the valuation).
Definition 23** (Pareto frontier tariffs).**
Given menu with distinct tariffs, the Pareto frontier of is derived by deleting all tariffs for which there exists a tariff such that and .
Lemma 24**.**
Given a menu of tariffs, a user only selects a tariff in the Pareto frontier.
Lemma 25**.**
Sorting the tariffs in the Pareto frontier in increasing order of is equivalent to sorting them in decreasing order of .
Lemma 26**.**
For any fixed number of units , the highest utility tariff in is . This is independent of the buyers’ values.
The following lemma states that as we increase the number of units the utility-maximizing tariff has higher and lower .
Lemma 27**.**
Let be the menu of Pareto frontier tariffs derived from menu . Suppose the tariffs in are reindexed in increasing order of . Consider the index of the utility-maximizing tariff for each number of units. This index is increasing as a function of the number of units.
See 1
Proof.
First, we reason about the length of the outcome menu. Let and be the length of the original menu and outcome menu, respectively. First, note that is also the length of the menu after rounding down and to their closest multiples of . Observe that is at most (because we never add extra tariffs) and also at most because there are distinct options for each and . Therefore, .
Then, we reason about the maximum loss in revenue. First, note that for any fixed tariff and number of units, the total price decreases by at most . We only need to show that the buyer does not switch from buying more units to fewer. Switching in the opposite order does not decrease the revenue more than . The reason is that the total price of each tariff is an increasing function as the number of units. Therefore, the minimum total price is increasing as a function of the number of units. Next, we prove that a buyer never switches from buying more units to less. We show two cases: switching between tariffs and staying with the same tariff. In the first case, by Lemma 27, this means that that a buyer never switches from a tariff with higher (lower ) to a lower (higher ). Since in the discretization procedure, the price of tariffs with higher decreases more than lower , the lower tariffs do not become utility-maximizing if they were not before. In the second case, by the rounding procedure, the total price of more units in the same tariff always decreases more; therefore, the lower number of units never becomes utility maximizing. Therefore, we conclude the payment of each tariff and therefore the revenue decreases at most by . Thus,
Finally, we find the total number of possible menus. Also, after the discretization all and are multiples of . Therefore, when restricted to length- menus, there are choices for each parameter of the menu, making an upper bound of . On the other hand, there are at most possible tariffs, and each one of them may appear or not in the menu. Therefore, the number of menus is also bounded by . ∎
A.2 Online Learning
A.2.1 Online Learning Under Adversarial Inputs
Full Information
Proposition 28** ([Auer et al., 1995], Theorem 3.2).**
For any sequence of valuations ,
[TABLE]
where are the set of experts (two-part tariff menus), is the expected revenue outcome of Algorithm 2, and is the revenue of the optimal menu in .
See 2
Proof.
Let be the number of menus resulting from the discretization procedure in Section 3.1. Let be the valuation of the buyer at step , and be the vector of valuation of all buyers in rounds through . We denote as the maximum revenue obtained in the set of menus resulting from the discretization procedure, as the optimal revenue, and as the revenue obtained from the weighted majority algorithm discussed above on the set of outcome menus of the discretization procedure. Then,
[TABLE]
where the first expression is a result of the discretization procedure, the second expression uses Proposition 28, the third expands the revenue over terms, and the last uses Theorem 1. Rearranging the terms, we have:
[TABLE]
We set variables and to minimize the exponent of in the regret. By substituting , the regret is upper bounded by
[TABLE]
By setting , The regret will be . Based on the parameters chosen, the number of menus is . The algorithm needs to maintain the weights for these menus and update them based on the revenue at each time step. The revenue of each menu can be calculated in given the buyer’s valuation, resulting in the stated running time. ∎
Partial Information
Proposition 29** ([Auer et al., 1995], Theorem 4.1).**
For any sequence of valuations ,
[TABLE]
where are the set of experts (two-part tariff menus), is the expected revenue outcome of Algorithm 3, and is the revenue of the optimal menu in .
See 3
Proof.
The proof follows the same logic as that of Theorem 2. We denote as the revenue obtained from Exp3 algorithm described above on the set of outcome menus of the discretization procedure. Similar to the proof of Theorem 2, in what follows denotes the number of menus resulting from the discretization procedure in Section 3.1. is the valuation of the buyer at step , and is the sequence of valuation of all buyers in rounds through . is the maximum revenue obtained in the set of menus resulting from the discretization procedure and is the optimal revenue.
[TABLE]
where the first expression is a result of Theorem 1, the second expression uses Proposition 29, the third expands the revenue over T terms, and the last uses Theorem 1. Rearranging the terms gives:
[TABLE]
We set variables and as a function of to minimize the exponent of in the regret. By setting , , the regret is . The algorithm involves maintaining weights for all the menus in the discretized set at each time step, therefore the running time at each time step is proportional to the number of the menus that is derived based on parameter . ∎
A.2.2 Online Learning Under Smooth Distributions
Smoothed Distributional Assumptions.
In an online setting under smoothed distributions, the algorithm receives samples , where is an arbitrary distribution over problem instances (which in our case is the buyer valuations). The goal is to find that nearly maximizes . In this setting, the goal is to find a value that is nearly optimal in hindsight over a stream of instances, or equivalently, over a stream of functions. Each is drawn from a distribution , which may be adversarial. Therefore, .
Dispersion.
Let be a set of functions mapping a set to . In this paper, we study the mechanism selection setting, given a collection of problem instances and a utility function , each function might equal the function , measuring a mechanism’s performance on a fixed problem instance as a function of its parameters. Informally, dispersion is a constraint on the functions that guarantees although each function may have discontinuities, they do not concentrate in a small region of space. We study two definitions of dispersion previously introduced in algorithm and mechanism selection problems. We show that menus of two-part tariffs satisfy both definitions; -dispersion (Definition 32) and -dispersion (Definition 35). Then, we use the first to establish online learning results for full-information and bandit settings and the second for the semi-bandit setting.
In order to prove menus of two-part tariffs satisfy dispersion under smoothed assumptions, we show this family of mechanisms satisfies certain structural properties. Balcan et al. [2018c] show in two-part tariff menus, for each function , the parameter space is partitioned into sets such that is -Lipschitz on each piece, but may have discontinuities at the boundaries between pieces.‡‡‡This previously-known structural result suffices for the techniques used in the setting with the limited number of buyers (Sections 3.2.3 and A.2.3); however, we need a refined statement for proving dispersion. We refine this structural property and show that multi-sets of parallel hyperplanes, corresponding to the stream of buyer valuations, partition the parameter space into convex polytopes with bounded-degree linear utility functions inside each polytope. Later, we show this property is sufficient for proving dispersion and employing the related algorithms.
Lemma 30**.**
Consider the sequence of buyer valuations arrived until time . For menus of two-part tariffs, the parameter space is partitioned into convex polytopes, by multisets of parallel hyperplanes, such that the utility function at each time step inside each region is a linear function satisfying -Lipschitz continuity.
Proof.
Part of the proof that identifies the regions with linear utilities has been shown previously in Balcan et al. [2018c], Lemma 3.15. We reiterate that part for completeness and also prove the extra structural properties. Consider the set of menus for which the buyer with valuation arriving at time selects the tariff index and the number of units . The buyer selects this option for menu if it produces more utility for the buyer than any other option. Formally,
[TABLE]
The above inequalities identify a convex polytope of parameter vectors (menus ) with hyperplane boundaries. Considering all the possible selections (the tariff index and the number of units), the parameter space for is partitioned into convex polytopes where inside each polytope the payment of is linear; i.e., . Considering the same analysis for all the buyers’ valuations in the sequence, for each buyer, the parameter space is partitioned into convex polytopes where inside each polytope, the revenue function is linear and -Lipschitz. Since convex polytopes are closed under intersection, superimposing the partitions for results in polytopes with the properties in the statement.
For a fixed valuation vector , the discontinuities in the utility function are defined by at most hyperplanes: . Let be the multi-set union of all these hyperplanes. Consider a set with corresponding multi-sets of hyperplanes. We now partition the multi-set union of into at most multi-sets for all and and such that for each , the hyperplanes in are parallel with probability 1 over the draw of . To this end, define a single multi-set to consist of the hyperplanes
[TABLE]
where the only variables are coordinates of . The hyperplanes inside each multi-set are parallel and the utility of the regions defined by the hyperplanes are linear and -Lipschitz.§§§Partitioning of the parameter space by parallel multisets of hyperplanes has been established before for other families of mechanism design such as posted pricing [Balcan et al., 2018b]. We extend this idea to the more complicated case of two-part tariffs. ∎
Next, we establish an upper bound on the number of regions with continuous (linear) regions.
Lemma 31**.**
The partitioning of the parameter space for menus of two-part tariffs explained in Lemma 30 after rounds results in regions, with linear cumulative utility function inside each region.
Proof.
Lemma 30 identifies multi-sets of size for each such that the hyperplanes inside the multi-sets are parallel. Therefore, each multi-set divides the parameter space into parts. Thus, each region with continuous utility can be defined as the intersection at most parts, where each part corresponds to a distinct multi-set. This results in at most such regions. ∎
In order to prove dispersion, we need to use an assumption on the distributions called -boundedness.
See 4
We first provide the definition of -dispersion. Recall that is a set of instances, is a parameter space, and is an abstract utility function. We use the distance and let denote a ball of radius centered at . We use this notion of dispersion to derive our full-information and bandit setting results.
Definition 32** ([Balcan et al., 2018b], -dispersion).**
Let be a collection of functions where is piecewise Lipschitz over a partition of . We say that splits a set if intersects with at least two sets in . The collection of functions is -dispersed if every ball of radius is split by at most of the partitions . More generally, the functions are -dispersed at a maximizer if there exists a point such that the ball is split by at most of the partitions .
We now prove menus of two-part tariffs satisfy dispersion, and use it to derive no-regret online learning results for full-information and bandit settings.
Proposition 33**.**
Suppose that is the revenue of the two-part tariff menu mechanism with prices and buyer’s values . With probability at least over the draw for any the following statement holds:
Suppose for any number of units . Also, suppose that for each distribution , and every pair of number of units and , and have a -bounded joint distribution. Then is
[TABLE]
with respect to .
Proof.
Lemma 30 gives multisets of parallel hyperplanes that partition the parameter space into regions with -Lipschitz continuous utility functions. Since the samples are drawn independently from -bounded distributions with support , the offsets of the hyperplanes in each multiset are independent random variables with -bounded distributions. Furthermore, the number of multisets is at most . Using these properties, Theorem 32 of Balcan et al. [2018b] gives the statement. ∎
After establishing dispersion and showing that the parameter space is partitioned into convex regions with cumulative linear utility inside each region, the no-regret guarantees and their performances are implied by prior results.
Full Information
For completeness we include previously established algorithms for the full information setting, under dispersion condition, adapted to our setting.
Overview of Algorithms 5 and 6, related to Theorem 5.
Algorithm 5 [Balcan et al., 2018b] is an efficient algorithm for online learning in the full-information setting under smoothed distributional assumptions that uses Algorithm 6 [Balcan et al., 2018b] as a subroutine. The algorithm considers the cumulative revenue function up until the time over the parameter space, , and samples the menu to be presented at time approximately proportional to an exponential function of its cumulative revenue, i.e., , where . In order to have an efficient implementation for sampling menu approximately from distribution with density , techniques from high-dimensional geometry are used in Algorithm 6. This algorithm is used when is piecewise concave (in our case, linear), and each piece is a convex set (in our case, convex polytopes where each buyer already in the sequence selects a fixed tariff index and the number of units) as shown in Lemma 30. Let be the partition of until time . The algorithm first picks with probability proportional to the integral of on that region and then outputs a sample from the conditional distribution of menus in . The algorithm assumes access to two procedures for approximate integration and sampling, namely and . is a polynomial running-time procedure that takes the approximate integral of any logconcave function restricted to region with accuracy parameter and failure probability . is a polynomial procedure that approximately samples a menu with probability distribution according to in the region with accuracy parameter and failure probability .
Definition 34** ( and [Balcan et al., 2018b]).**
For any logconcave function , any accuracy parameter , and any failure probability , outputs a number that with probability at least satisfies . For any logconcave function , any accuracy parameter , and any failure probability , outputs a sample drawn from a distribution that with probability at least , , where is the relative (multiplicative) distance between probability measures and . Formally, , where denotes the Radon-Nikodym derivative.
Similar to [Balcan et al., 2018b], we use the implementation of by Lovász and Vempala [2006] and by Bassily et al. [2014], Algorithm 6. These implementations satisfy the conditions in Definition 34. The first runs in time poly, where the domain of function is a subset of a ball of radius and its level set of probability mass is a superset of a ball with radius . The second succeeds with probability and runs in time poly.
See 5
Proof.
Proposition 33 determines the dispersion for two-part tariff menus with probability . Theorem 1 in Balcan et al. [2018b] relates dispersion to a regret bound for full information online learning algorithms. It states if a sequence of piecewise -Lipschitz functions in dimensions is -dispersed, there is an exponentially weighted forecaster with expected regret . Since dispersion holds with probability , the final regret bound is . Substituting and by dispersion found in Proposition 33 gives:
[TABLE]
For all rounds, , the sum of utilities is linear over at most pieces, and all the pieces are convex. In this case, we may use Algorithm 6 as a subroutine to Algorithm 5 for a more efficient but approximate implementation. Setting dispersion parameters and and approximation parameters and using Theorem 1 in Balcan et al. [2018b], gives the statement’s regret bound and running time. ∎
Bandit Setting
The bandit-setting algorithm considers a grid over the parameter space, whose granularity depends on the dispersion parameters, and runs the Exp3 algorithm over menus corresponding to the grid.
See 6
Proof.
Proposition 36 determines dispersion for two-part tariff menus with probability . Theorem 3 in Balcan et al. [2018b] relates dispersion to a regret bound for the bandit setting. It states if a sequence of piecewise -Lipschitz functions that are -dispersed and when the parameter space is contained in a ball of radius , running Exp3 algorithm has regret
[TABLE]
The per-round running time is . Note that dispersion holds only with probability and with probability , regret is bounded by . In our case, , and . Substituting these terms along with and , and setting and gives the regret bound and running time in the theorem statement. ∎
Semi-Bandit Setting
For the semi-bandit setting, we need to invoke a more recent definition of dispersion.
Definition 35** ([Balcan et al., 2020a], -point-dispersion).**
The sequence of loss functions is -point-dispersed for the Lipschitz constant if for all and for all , we have that, in expectation, the maximum number of functions among that fail the -Lipschitz condition for any pair of points at distance in is at most . That is, for all and for all , we have \operatorname{\mathbb{E}}\bigl{[}\max_{\rho,\rho^{\prime}}\bigl{|}\{t\in[T]\,:\,|l_{t}(\rho)-l_{t}(\rho^{\prime})|>L\|\rho-\rho^{\prime}\|_{2}\}\bigr{|}\bigr{]}=\tilde{O}(\varepsilon T). where the max is taken over all .
Proposition 36**.**
Suppose , where is the revenue of the two-part tariff menu mechanism with prices and buyer’s values at time , where buyers’ values are drawn from . If are -bounded, where , and and , the maximum number of units and the number of tariffs, are polynomial in , these loss functions are -point-dispersed for .
Proof.
We use the following statement from Balcan and Sharma [2021], theorem 7.
Proposition 37**.**
[Balcan and Sharma, 2021]* Let be independent piecewise -Lipschitz functions, each having discontinuities specified by a collection of at most algebraic hypersurfaces of bounded degree. Let denote the set of axis-aligned paths between pairs of points in , and for each define . Then we have .*
The number of hyperplanes, defined as in the theorem, is at most and s are piecewise -Lipschitz function (by Lemma 51); where is the number of buyers (rounds), is the number of tariffs, and is the maximum number of units. Note that, as shown in Lemma 30. The independence of s comes from the assumptions of this setting, where the buyer valuations for each round are drawn independently.
Definition 35 counts the number of times (in time intervals) that the difference in utility of the pair violates the -Lipschitz condition, and finds the worst pair for this property. Proposition 37, counts the number of times that in an axis-aligned path, the utility function has discontinuities. Therefore, is an upper bound on \operatorname{\mathbb{E}}\bigl{[}\max_{\rho,\rho^{\prime}}\bigl{|}\{t\in[T]\,:\,|u_{t}(\rho)-u_{t}(\rho^{\prime})|>L\|\rho-\rho^{\prime}\|_{2}\}\bigr{|}\bigr{]}. To find the dispersion we need to find .
Recall from the proof of Proposition 33 that the discontinuities can be partitioned into multisets of parallel hyperplanes, such that multiset corresponds to pairs of tariffs and the number of units and . In addition, since we assume the buyers’ valuations are in the range and are drawn from pairwise -bounded joint distributions, the offsets of the hyperplanes are independent draws from a -bounded distribution. The number of multi-sets is , and the size of each multi-set is . The hyperplanes within each multi-set are well-dispersed. For a multi-set , let be the multi-set of the hyperplanes’ offsets. By assumption, the elements of are independently drawn from -bounded distributions. Since the offsets are -bounded, the probability that it falls in any interval of length is . The expected number of hyperplanes crossed from each multiset in distance along each axis is at most , and since there are dimensions, the total expected number of crossings is . Using the upper bound on , in total, for any pair of points at distance . By Proposition 37, , which in our case is upper bounded by: . For , and , . Therefore, these loss functions are -point dispersed for , satisfying the statement. ∎
Overview of Algorithm 7
The generic algorithm for the semi-bandit case was previously developed in Balcan et al. [2020a]. We adapt it to our setting and consider an efficient implementation using the approximate integration and sampling from Balcan et al. [2018b] discussed in Definition 34. The semi-bandit-setting algorithm is a continuous version of the Exp3-SET algorithm of Alon et al. [2017]. At each time step, the algorithm learns the revenue function (only) inside the region that the presented menu belongs to and updates the menu weights for the next round accordingly.
See 7
Proof.
For the regret bound, we invoke Theorem 2 of Balcan et al. [2020a], stating that if the loss functions are Lipschitz functions satisfying -point-dispersion, running Algorithm 7 has expected regret bounded by , when the loss function is in . In our case, , the number of dimensions is , the dispersion parameter , and the loss function is in . This implies the regret bound.
Now, we discuss the running time of the algorithm. At each time , using the buyer’s valuation vector, the tariff and the number of units selected by the buyer, we can determine the region , where the buyer makes the same selection and whose utility function is linear by solving a linear program (the inequalities in Equation 2). This computation is done in time poly. Next, for the integration procedures inside the algorithm, we use the approximate version introduced in Definition 34 and for sampling, we use the efficient implementation demonstrated in Algorithm 6. In particular, we consider . For , we use lines 1 through 3 of Algorithm 6 and take the sum of the integration outcomes of line 3, for and . For we do the same, except that now we do the integration operations in line 3 only for the regions inside . For sampling from , we use the complete procedure Algorithm 6 that takes the regions with linear cumulative utility, , and . Note that since the loss is only updated for , for any regions outside this part, we do not need to repeat the integration operations in Algorithm 6. This may result in potentially better running time for semi-bandit compared to full-information; however, we do not quantify the improvement. Using union bound, with probability at least , all the approximate integration and sampling operations performed in the algorithm succeed and the density function of the approximate distribution used for sampling is always within fraction of the exact distribution. Using these parameters together with Theorem 1 in [Balcan et al., 2018b] conclude that the same regret bound is achievable from the approximate operations and give the running time in the statement.
∎
A.2.3 Limited Buyer Types
We reiterate the results of partitioning the parameter space into convex regions with linear cumulative utility functions where the statements are adapted to the limited buyer type setting and corresponding notations.
Lemma 38**.**
[Adapted from Balcan et al. [2018c], Lemma 3.15] For each feasible mapping , as defined in Definition 9, is a convex polytope with hyperplane boundaries.
Proof.
For a fixed buyer type and option , let be the set of all parameter vectors corresponding to the length- menus that buyer type selects option . The buyer selects option for menu if this option produces more utility for the buyer than any other option. Formally,
[TABLE]
The above inequalities identify a convex polytope of parameter vectors (menus ) with hyperplane boundaries. is the intersection of for . Therefore, is also a convex region with hyperplane boundaries. ∎
Lemma 39**.**
[Adapted from Balcan et al. [2018c], Lemma 3.15] For each feasible mapping and any sequence of buyer valuations the cumulative utility, , is linear in .
Proof.
We show that for any buyer valuation in the sequence, is linear in the region. Proving this claim is sufficient for concluding the statement. Let , i.e., is the tariff index and is number of units that buyer valuation selects under . Therefore, the utility for this buyer for menu is . Both and grow linearly as a function of . Therefore, since the option that each buyer valuation selects (the tariff index and the number of units) is fixed inside , the utility is also linear. ∎
Full Information Setting
See 13
Proof.
We run the weighted majority algorithm Algorithm 2 with parameter on the set as the set of menus (experts). The proof directly follows from Lemma 12 and Proposition 28. Let . Let be the valuation of the buyer at step , and be the vector of valuation of all buyers in rounds through . We denote as the maximum revenue obtained in the set of , as the optimal revenue, and as the revenue obtained from Algorithm 2 on the set of experts . Then,
[TABLE]
where the first expression uses the size of in Lemma 11, the second expression uses Proposition 28, the third expands the revenue over T terms, and the last uses Lemma 12. Rearranging the terms, we have:
[TABLE]
We set variables and to minimize the exponent of in the regret. By setting and , The regret will be . ∎
Partial Information Setting
We first show how to estimate the utility of any menu by only using the response of the buyer to a limited number of menus. In doing so, we take advantage of the interdependence of the buyers’ responses for different menus to obtain estimates for unused menus. In particular, using barycentric spanner concept from [Awerbuch and Kleinberg, 2008], we devise a basis for the menus such that observing buyers’ responses to them is sufficient for estimating the revenue of other menus.
Let be a set of length- indicator vectors, such that for each feasible mapping and option to select , which is the tariff index and the number of units, there is a vector in . This vector indicates the (maximal) set of buyer types that select this option in mapping . As an example, if in mapping , is the exact set of valuation types that select the same option , vector belongs to . For , and denote the corresponding mapping and option to , respectively. Similarly, is the vector in , corresponding to mapping and option . Using principles from linear algebra, since the vectors are -dimensional, there is a set of at most vectors in such that any other vector in is a linear combination of the vectors in this set. Awerbuch and Kleinberg make this property stronger and show that there is a set of vectors in , called the barycentric spanner or spanner for short, we denote it by , such that any member of can be written as a linear combination of vectors in with coefficients in .
Lemma 40**.**
There exists set in such that, for all , there exists coefficients , so that .
Proof.
The statement is a direct corollary of [Awerbuch and Kleinberg, 2008] Proposition 2.2. ∎
Here is the main idea on how to find estimates for the utility of all the menus by only presenting the menus corresponding to the spanner to the buyers. First, similar to Balcan et al. [2015], we define function for the vectors in that will be instrumental in computing the utility for all the menus based on the spanner. Recall that each vector in corresponds to a mapping and an option . Let be the number of times during a time block that given a menu in the arriving buyer selects option . First, we show how the quantity of this function on inputs from the spanner is sufficient for finding the revenue of arbitrary menus and then show how to estimate it.
Lemma 41**.**
For each menu and any time block , let represent the average utility of for buyer types in . Then,
[TABLE]
Proof.
By definition, is the average utility of menu for buyers arriving in . Menu , corresponds to a feasible mapping . By definition, the buyers in time block select option equal to number of times. By Lemma 40, can be written as a linear combination of the vectors in the spanner. Furthermore, is a linear function as it is equivalent to the dot product of a vector indicating the frequency, i.e., the number of arrivals, of each buyer type during and the function input. Therefore,
[TABLE]
∎
Let be the estimator to for the spanner vectors. Let be the corresponding mapping to . Recall that is the number of times during that given a menu in , the arriving buyer, selects option . In order to estimate this quantity we present a corresponding menu to , i.e., a menu in , once uniformly at random during the time block . If the buyer selects option , we let equal to and otherwise set it to [math]. The next lemma shows that has the same expected value and has range . Intuitively, the reason is that due to uniform random selection of the time step, the estimator has the same expected value.
Lemma 42** (Adapted from Balcan et al. [2015] Lemma 6.3).**
For any , .
Proof.
Note that if and only if at the time step that menu was presented, was selected. Since is presented once uniformly at random over the time steps and is independent of the sequence of buyers, the buyer presented with is also picked uniformly at random over the time steps. Therefore, is the probability that a randomly chosen buyer from time block selects . ∎
Now, we prove that the expected value of the utility estimator for each menu is equal to the utility of that manu, i.e., the estimator is an unbiased, and moreover, has a bounded range. The utility estimator is defined as follows, where in the utility formula is replaced by its estimator .
[TABLE]
Lemma 43**.**
For any menu , and .
Proof.
The proof of the equality of the expectation simply follows from and definitions and Lemma 42. Now, we prove the range of the estimator. Since is a barycentric spanner, for any , . Also, belongs to . Also, the utility of the buyer selecting each option in the menu, e.g., , is always in . Therefore, using the formula of the estimator, it is bounded by times the number of options times the number of buyer types. ∎
We use the algorithm below along with the weighted majority algorithm in the full-information (similar to Algorithm 2) that uses the utility (revenue) estimates. We use as the set of experts (menus) and obtain distribution over set as the weight vector.
Overview of Algorithm 8
First, we provide a high-level structure of the algorithm and then discuss the details. The algorithm operates in time blocks, with each block consisting of exploitation and exploration time steps. The exploration time steps are selected uniformly at random within the block and are limited in number. In an exploitation step, the menu used is the output of the full information algorithm, employing the utility estimators from the previous time block. These menus are always the extreme points of the continuity regions, as discussed at the beginning of the section. During exploration time steps, the corresponding menu to a vector in the spanner is used. At the end of each time block, the algorithm refines the unbiased estimators of the utility of all extreme points using the information gathered in the exploration phases.
is the number of time blocks, with each time block consisting of time steps. The algorithm uniformly at random picks time steps and their permutation in the current time block. Whenever the time step is equal to , the algorithm runs an exploration step; otherwise, the algorithm runs an exploitation step. In the exploration step at time step , a menu corresponding to , , is presented to the arriving buyer and the estimator will be assigned as if the buyer selects and will be assigned as [math], otherwise. At the end of the time block, we update the estimates of the revenue of the menus corresponding to the extreme points.
Lemma 44**.**
[[Balcan et al., 2015] Lemma 6.2] Let be the set of all actions. For any time block (set of consecutive time steps) and action , let be the average loss of action over . Assume that is such that by sampling all actions in , we can compute for all with the following properties: and . Then there is an algorithm with loss , where is the loss of the best action in hindsight.
We are now ready to prove the main result of this section.
See 14
Proof.
In Lemma 44, is the number of dimensions (barycentric spanner set), is the maximum revenue times the number of buyer types times the number of their options (entries in the menu), is the number of discrete points. In our case, , , and . By Lemma 43, the expected value of the estimated utility is equal to the exact value of utility with range .
Using Lemma 44, the regret for menus of two-part tariffs is bounded by
[TABLE]
∎
The following quantifies the regret of simply running the Exp3 algorithm on the set of extreme points.
Proposition 45**.**
In the partial information case for length- menus of two-part tariffs when there are buyer types, running Algorithm 3 over menus corresponding to for has regret bound .
Proof.
The proof is similar to that of Theorem 6. We denote as the revenue obtained from Exp3 algorithm as presented in Algorithm 3 on the set of menus corresponding to . Let denote the number of such menus. is the valuation of the buyer at step , and is the sequence of valuation of all buyers in rounds through . is the maximum revenue obtained in the set and is the optimal revenue.
[TABLE]
where the first expression uses the size of in Lemma 11, the second expression uses Proposition 29, the third expands the revenue over T terms, and the last uses Lemma 12.Rearranging the terms, we have:
[TABLE]
We set variables in and as a function of to minimize the exponent of in the regret. By setting and , the regret is ∎
Remark. The standard technique for the partial information algorithm of running the Exp3 algorithm on the extreme points leads to a regret bound that is exponential in the size of the menu as stated in Proposition 45; however, Algorithm 8 has regret bound polynomial in the size of the menus. Therefore, the new technique results in a significant improvement.
A.3 Distributional Learning
See 15
Proof.
We need to find the number of samples such that with probability , the difference between the expected revenue of our algorithm and the optimal revenue is at most . Note that since our algorithm uses discretization of possible menus, we face two types of errors: the discretization error, and the usual empirical error in a PAC learning setting. We find the sample complexity and discretization parameters such that the total error is bounded by .
The possible number of menus after discretization using parameter is computed by the following formula.
[TABLE]
Using uniform convergence in the PAC learning setting, the sample complexity for empirical error is as follows.
[TABLE]
Replacing we have,
[TABLE]
Also, the revenue loss compared to the optimum for arbitrary buyer with valuation is:
[TABLE]
The total error (from discretization and empirical error), when the empirical error is set to , is
[TABLE]
By setting , we have
[TABLE]
Replacing gives the following sample complexity:
[TABLE]
which by replacing with results in total error.
The computational complexity of finding the empirical optimal menu for buyers and menu of size is:
[TABLE]
This implies the efficiency of the algorithm. ∎
Lemma 46**.**
The running time of distributional learning algorithm for two-part tariffs in [Balcan et al., 2020b] is at least
[TABLE]
Proof.
The algorithm involves computing regions, where is , and solving a linear program for each region with variables and constraints, which takes . ∎
Comparison with previous results.
The sample complexity using the pseudo-dimension method of [Balcan et al., 2018c] is and the best previously-known running time [Balcan et al., 2022a] is , where the number of discontinuity regions is bounded by , resulting in the worst case running time of due to [Balcan et al., 2020b, 2022a] (See Lemma 46).
Appendix B Missing Proofs of Section 4
B.1 Missing Proofs for the Discretization Procedure
Before providing the proof of the discretization step, we note that this procedure for menus of lotteries needs extra care and the common rounding of the parameters may result in arbitrarily lower revenue. For example, if there are two lotteries with a similar utility for the buyer but a large difference in prices, minor changes in the probability of allocations or the prices may make the user switch from the high-price lottery to the low-price one. What follows is a concrete example of why standard rounding procedures fail.
Example 1**.**
Consider a menu of three lotteries.
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Consider the buyer that has value for the item. The first table shows the original menu. With this menu the buyer’s highest utility option is the last lottery that causes the highest revenue, i.e., . The following tables show the new menus after rounding down the allocation probabilities and prices, rounding up allocation probabilities and rounding down prices, and rounding up allocation probabilities and prices (all to powers of ), respectively. All these transformations result in the highest utility lottery changing to the middle lottery which causes smaller revenue.
See 16
Proof.
Most of this proof is identical to that of Dughmi et al. [2014]. Note that in the algorithm, the original entries in a menu are divided into levels such that is the lowest-price level and is the highest price one. First, we show that if a buyer’s utility-maximizing lottery is in level given , their utility-maximizing lottery in is never in a lower-price level . Intuitively, the reason is that the lotteries with lower-level prices have their allocation reduced more and their prices reduced less than the ones in higher levels. More formally, let be at level and at level . Also, let and be the transformed lotteries in the output of the algorithm. Than, , and for every valuation , . Now, consider an arbitrary valuation that has higher utility choosing than . Therefore , and therefore . Combining this inequality with the ones above implies
Secondly, we compute an upper bound on the loss incurred. Suppose the original utility-maximizing lottery was in . Also, suppose in , the utility-maximizing lottery is which is the transformation of . The first scenario is when . Note that in this case, may be smaller by a factor than , then to obtain we first lost a multiplicative factor of and then an additive factor of at most (including the rounding). Thus . In the second case where , the loss is at most . Therefore, in any case, .
Thirdly, the set of possible prices is which is of size and the set of possible allocation probabilities is , for which is of size . In the -length menus, there are prices and allocation probabilities in total. In the unlimited-length menus, we consider the possibility that each potential lottery (each distinct vector of parameters) belongs to the lottery or not. This analysis gives us the final size of the discrete points.
∎
B.2 Online Learning
Similar to the section on two-part tariffs, using the outcome of the discretization summarized in Theorem 16, we show a reduction to a finite number of experts and run standard learning algorithms (weighted majority and Exp3) over the menus in the discretized set.
B.2.1 Full Information
In the full information setting, the seller sees the revenue generated for all the possible menus. To design an online algorithm in this case, we use a variant of the weighted majority algorithm by [Auer et al., 1995]. The experts in our case are the discretized menus from the previous section, denoted in the algorithm by set . Furthermore, is the valuation of the buyer are time and is the cumulative revenue of menu for the buyers until time step .
Similar to two-part tariffs, we use Algorithm 2 for the full information case. The only difference is that since the maximum revenue in lotteries is , as opposed to two-part tariffs where it is , in the algorithm we need to replace with .
Proposition 47** ([Auer et al., 1995], Theorem 3.2).**
For any sequence of valuations ,
[TABLE]
where are the set of experts (lottery menus), is the expected revenue outcome of Algorithm 2 where is replaced with , and is the revenue of the optimal menu in .
See 17
Proof.
Let be the number of menus resulting from Algorithm 4. Let be the valuation of the buyer at step , and be the vector of valuation of all buyers in rounds through . We denote as the maximum revenue obtained in the set of menus resulting from Algorithm 4, as the optimal revenue, and as the revenue obtained from the weighted majority algorithm discussed above on the set of outcome menus of Algorithm 4. We have
[TABLE]
where the first expression is a result of Algorithm 4, the second expression uses Proposition 47, the third expands the revenue over terms, and the last uses Theorem 16. Rearranging the terms, we have:
[TABLE]
We set variables , , , and as a function of to minimize the exponent of in the regret. The regret is upper bounded by
[TABLE]
where the inequality follows by upper bounding . By setting , , , and the regret is bounded by . ∎
See 18
Proof.
The proof follows the same argument as Theorem 17. The only difference in the parameters is , the number of experts, which in this case is We set variables , , , and as a function of to minimize the exponent of in the regret. The regret is upper bounded by the formula below after substituting
[TABLE]
By setting , , , and , the regret is bounded by . ∎
B.2.2 Bandit Setting
In the partial information setting, the seller does not see the outcome for all the possible menus and only observes the outcome of the menu used (the lottery chosen by the buyer). Similar to the two-part tariffs results, to design an online algorithm in this case, we use a version of the Exp3 algorithm in [Auer et al., 1995]. This variant of the Exp3 algorithm contains the weighted majority algorithm (Algorithm 2) a subroutine. At each step, we mix the probability distribution , used by the weighted majority algorithm, with the uniform distribution to obtain a modified probability distribution , which is then used to select a menu from our discretized set. Following the lottery chosen by buyer , we use the price paid (the gain from the chosen menu) to formulate a simulated gain vector, which is then used to update the weights maintained by the weighted majority algorithm.
Similar to two-part tariffs, we use Algorithm 3 for the bandit case. The only difference is that since the maximum revenue in lotteries is , as opposed to two-part tariffs where it is , in the algorithm we need to replace with .
Proposition 48** ([Auer et al., 1995], Theorem 4.1).**
For any sequence of valuations ,
[TABLE]
where are the set of experts (lottery menus), is the expected revenue outcome of Algorithm 3 where is replaced with , and is the revenue of the optimal menu in .
See 19
Proof.
The proof follows the same logic as that of Theorem 17. We denote as the revenue obtained from Exp3 algorithm described above on the set of outcome menus of Algorithm 4. Similar to the proof of Theorem 17, in what follows denotes the number of menus resulted from the procedure Algorithm 4. is the valuation of the buyer at step , and is the vector of valuation of all buyers in rounds through . is the maximum revenue obtained in the set of menus resulted from Algorithm 4 and as the optimal revenue.
[TABLE]
where the first expression is a result of Algorithm 4, the second expression uses Proposition 48, the third expands the revenue over terms, and the last uses Theorem 16. Rearranging the terms, we have:
[TABLE]
We set variables , , , , and as a function of to minimize the exponent of in the regret. After substituting , the regret is upper bounded by
[TABLE]
By setting , , , and , the regret is bounded by . ∎
B.3 Limited Buyer Types
The ideas for designing a specific algorithm specific to the limited buyer types in the menus of lotteries are similar to those for menus of two-part tariffs. There are a few changes that we overview here.
One of the main differences is the menu options . Unlike two-part tariffs that given a menu, the buyer needed to select a tariff and number of units that maximized the buyer’s utility; for menus of lotteries, the options are exactly aligned with menu entries, and for length- lotteries. The mechanism designer’s utility (revenue) given menu is equal to if the buyer selects entry . The buyer selects entry , if this entry results in higher utility than any other entry in menu . These inequalities identify regions , where the buyer’s utility maximizing option is aligned with .
Definition 49** (menu option for menus of lotteries, ).**
Index such that indicating a lottery index in the menu is a menu option. We denote the set of all menu options as . This set identifies all potential actions of a buyer when presented with a menu.
Definition 50** (mapping , feasible mappings, ).**
A mapping is a function from buyer types, to menu options , where is the lottery index assigned to the buyer type. Mapping is feasible if there is a menu corresponding to the mapping, i.e., a menu that if presented to the buyers, each buyer selects their corresponding option in the mapping as their utility maximizing option. denotes the region of the parameter space corresponding to , i.e., the set of menus inducing mapping .
Lemma 51**.**
For each feasible mapping , as defined in Definition 50, is a convex polytope with hyperplane boundaries.
Proof.
For a fixed buyer type and option , let be the set of all parameter vectors corresponding to the length- menus that buyer type selects option . The buyer selects option for menu if this option produces more utility for the buyer than any other option. Formally,
[TABLE]
The above inequalities identify a convex polytope of parameter vectors (menus ) with hyperplane boundaries. is the intersection of for . Therefore, is also a convex region with hyperplane boundaries. ∎
Lemma 52**.**
For each feasible mapping and any sequence of buyer valuations the cumulative utility, , is linear in .
Proof.
We show that for any buyer valuation in the sequence, is linear in the region. Proving this claim is sufficient for concluding the statement. Let , i.e., is the lottery index that buyer valuation selects under . Therefore, the utility for this buyer for menu is . Note that is a coordinate of and therefore, has a linear dependence on . Therefore, since the option that each buyer valuation selects is fixed inside , the utility is also linear. ∎
Lemma 53**.**
The number of extreme points for menus of lotteries, , is at most .
Proof.
Length- menus of lotteries occupy a -dimensional parameter space. In each -dimensional space, an extreme point is the intersection of linearly independent hyperplanes. The total number of hyperplanes defining the regions is , where for each buyer type compares the utility of two menu entries. Out of these hyperplanes, we need of them to intersect for an extreme point. Therefore, the number of extreme points is at most , implying the statement. ∎
The following lemma bounds the loss in utility where the set of menus is limited to the extreme points . The proof is similar to Balcan et al. [2015]; however, the loss depends on the problem-specific utility functions.
Lemma 54**.**
Let be as defined in Definition 10, then for any sequence of buyer valuations , and as the optimal menu in the hindsight:
[TABLE]
Proof.
The proof is similar to that of Lemma 12. The only difference is in step (vi) which computes the loss in revenue between menus that are at distance. In menus of lotteries this distance implies a price difference of at most in any of the lotteries in the menu, and therefore causes total loss per time step. ∎
Full Information Setting
See 20
Proof.
The proof follows the same logic as of theorem 13. We run the weighted majority algorithm (Algorithm 2, where is replaced by ) with parameter on the set as the set of menus (experts). The proof directly follows from Lemma 54 and Proposition 47. Let . Let be the valuation of the buyer at step , and be the vector of valuation of all buyers in rounds through . We denote as the maximum revenue obtained in the set of , as the optimal revenue, and as the revenue obtained from Algorithm 2 on the set of experts . Then,
[TABLE]
where the first expression uses the size of in Lemma 53, the second expression uses Proposition 47, the third expands the revenue over T terms, and the last uses Lemma 54. Rearranging the terms, we have:
[TABLE]
We set variables and to minimize the exponent of in the regret. By setting and , The regret will be . ∎
Partial Information (Bandit) Setting
In the partial information setting, the change in the menu options also affects the definition of set that consists of indicator vectors over the buyer types that select the same menu entry in a mapping . The changes that need to be made in Algorithm 8 to work for menus of lotteries include changing to , using option (menu entry) instead of , and changing utility from to . After making these changes, we can perform the modified algorithm to achieve a bounded regret.
Lemma 55**.**
For any menu , and .
Proof.
The proof is similar to Lemma 55. The proof of the equality of the expectation simply follows from and definitions and Lemma 42. Now, we prove the range of the estimator. Since is a barycentric spanner, for any , . Also, belongs to . Additionally, the utility of the buyer selecting each option in the menu, e.g., , is always in . Therefore, using the formula of the estimator, it is bounded by times the number of options times the number of buyer types. ∎
See 21
Proof.
The proof follows the same logic as of theorem 14. In Lemma 44, is the number of dimensions (barycentric spanner set), is the maximum revenue times the number of buyer types times the number of their options (entries in the menu), is the number of discrete points. In our case, , , and . By Lemma 55, the expected value of the estimated utility is equal to the exact value of utility with range .
Using Lemma 44, the regret for menus of lotteries is bounded by
[TABLE]
∎
B.4 Distributional Learning
See 22
Proof.
We need to find the number of samples such that with probability , the difference between the expected revenue of our algorithm and the optimal revenue is at most . Note that since our algorithm uses discretization of possible menus, we face two types of errors: the discretization error, and the usual empirical error in a PAC learning setting. We find the sample complexity and discretization parameters such that the total error is bounded by .
The possible number of menus after discretization using Algorithm 4 with parameter is computed by the following formula.
[TABLE]
Using uniform convergence in the PAC learning setting, the sample complexity for empirical error is as follows.
[TABLE]
Replacing we have,
[TABLE]
Also, the revenue loss compared to the optimum for arbitrary buyer with valuation when using Algorithm 4 with parameters , , and (we use instead of in Algorithm 4 and reserve for -learning) is computed by the following formula.
[TABLE]
The total error (from discretization and empirical error), when the empirical error is set to , is
[TABLE]
By setting , , and , the total mistake is less than .
Replacing these parameters and substituting with to satify total error , we have the following sample complexity:
[TABLE]
Also, replacing the parameters we have:
[TABLE]
The computational complexity of finding the empirical optimal menu for buyers and menu of size is:
[TABLE]
This implies the computational complexity of the algorithm. ∎
Theorem 56**.**
For arbitrary-length menus of lotteries, there is a discretization-based distributional learning algorithm with sample complexity
[TABLE]
and running time
[TABLE]
Proof.
This proof follows the same line as the proof of Theorem 22. We need to find the number of samples such that with probability , the difference between the expected revenue of our algorithm and the optimal revenue is at most . Note that since our algorithm uses discretization of possible menus, we face two types of errors: the discretization error, and the usual empirical error in a PAC learning setting. We find the sample complexity and discretization parameters such that the total error is bounded by .
The possible number of menus after discretization using Algorithm 4 with parameter is computed by the following formula.
[TABLE]
Using uniform convergence in the PAC learning setting, the sample complexity for empirical error is as follows.
[TABLE]
Replacing we have,
[TABLE]
Also, the revenue loss compared to the optimum for arbitrary buyer with valuation when using Algorithm 4 with parameters , , and (we use instead of in Algorithm 4 and reserve for -learning) is computed by the following formula.
[TABLE]
The total error (from discretization and empirical error) when the empirical error is set to is
[TABLE]
By setting , , and , the total mistake is less than .
Replacing these parameters and substituting with to satify total error , we have the following sample complexity:
[TABLE]
Also, replacing the parameters we have:
[TABLE]
The computational complexity of finding the empirical optimal menu for buyers is the number of potential menus times times the maximum size of a menu which is . ∎
Lemma 57**.**
The sample complexity of length menus of lotteries using the techniques in [Balcan et al., 2018b] is bounded by
[TABLE]
Proof.
Balcan et al. [2018c] introduce delineability as a condition to upper bound the pseudo-dimension and therefore, the sample complexity. They show the class of lotteries is -delineable. Also, if is a mechanism class that is -delineable, then the pseudo dimension of is at most . Therefore, the pseudo-dimension for menus of lotteries is bounded by . Furthermore, the sample complexity is at most , which by replacing pseudo dimension for this class of mechanism completes the proof. ∎
Appendix C Failure of Dispersion for menus of lotteries
In this section, we prove that without making extra assumptions about optimal menus of lotteries, both definitions of dispersion (Definitions 32 and 35) fail. In particular, we show that the failure of both conditions happens if the optimal menu (maximizer) has two lotteries close to each other (similar coordinates) and satisfies some other properties. Example 2 illustrates a setting where there are lotteries with arbitrarily close coordinates in the optimal menu.
Theorem 58**.**
Let the maximizer have the following properties, where are the coordinates of , respectively illustrating the probability of allocating item one in lottery , the price of lottery , the probability of allocating item one in lottery , the price of lottery , and the allocation probability for other items are the same across these lotteries.
, where is the Lipschitz parameter. 2. 2.
. 3. 3.
* is a constant such that .*
In this case, for every -bounded distribution whose density is also lower-bounded by , the conditions of Definitions 32 and 35, are violated. In particular, in Definition 32, the probability of a hyperplane crossing the -radius ball centered at the maximizer is a constant depending on ; and in Definition 35, there exists a pair of points such that the expected number of times that their loss function difference violates the Lipschitz condition for any Lipschitz constant is a constant depending on .
Proof.
We first show why Definition 32 fails. Consider a ball of radius centered at the maximizer . Let this ball be . We show that the probability of a hyperplane crossing is constant. Consider a point . We first find the probability density of hyperplanes going through . Then, we integrate to find the probability of crossing the ball. The following equation shows for what value of (the value for the item), the hyperplane goes through .
[TABLE]
Let and be the minimum value of for which the hyperplane crosses the ball (i.e., there is such that , and the maximum value respectively. The probability that the hyperplane crosses the ball is , where is the density function of the value for the item.
We consider the following points. These points are all in proximity of , therefore, fall in a ball of radius centered at . Consider points with and = . Let be in . Let be in .
With the above construction, the numerator ranges from to , and the denominator ranges from to . Therefore, and . For -bounded distribution with support , is at least
[TABLE]
which is constant for a constant .
Now, we show that Definition 35 fails. To do so, we still consider pair of points and which correspond to and , respectively. If we consider the line segment connecting and , the probability of the hyperplane crossing these two points is still which again for -bounded distribution with support whose density is also lower-bounded by , is at least
[TABLE]
which is constant for a constant . Note that and which implies anytime the hyperplane crosses between and , the difference in the loss, is at least . Also, the Euclidean distance between and is less than . Therefore, the Lipschitz condition for constant is violated a constant fraction of times in expectation. ∎
The following example shows that in the optimal menu of lotteries, lottery pairs can be arbitrarily close to each other.
Example 2** ([Daskalakis et al., 2014]).**
Consider the case of two items, when the buyer’s value for each item is drawn i.i.d. from the distribution supported on with density function . Daskalakis et al. prove for this example that the unique (up to differences of measure zero) optimal mechanism has uncountable menu complexity. That is, the number of distinct options available for the buyer to purchase is uncountable. They show that the optimal mechanism contains the following four kinds of options: (a) the buyer can receive item one with probability , and item two with probability paying the price , for any , (b) the buyer can receive item two with probability , and item one with probability paying the price , for any , (c) the buyer can receive both items and pay , and (d) the buyer can receive neither item and pay nothing.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Alon et al. [2017] Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, and Ohad Shamir. Nonstochastic multi-armed bandits with graph-structured feedback. SIAM J. Comput. , 46(6):1785–1826, 2017. doi: 10.1137/140989455 . URL https://doi.org/10.1137/140989455 . · doi ↗
- 2Auer et al. [1995] Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th annual foundations of computer science , pages 322–331. IEEE, 1995.
- 3Awerbuch and Kleinberg [2008] Baruch Awerbuch and Robert Kleinberg. Online linear optimization and adaptive routing. J. Comput. Syst. Sci. , 74(1):97–114, 2008. doi: 10.1016/j.jcss.2007.04.016 . URL https://doi.org/10.1016/j.jcss.2007.04.016 . · doi ↗
- 4Awerbuch and Mansour [2003] Baruch Awerbuch and Yishay Mansour. Adapting to a reliable network path. In Elizabeth Borowsky and Sergio Rajsbaum, editors, Proceedings of the Twenty-Second ACM Symposium on Principles of Distributed Computing, PODC 2003, Boston, Massachusetts, USA, July 13-16, 2003 , pages 360–367. ACM, 2003. doi: 10.1145/872035.872090 . URL https://doi.org/10.1145/872035.872090 . · doi ↗
- 5Balcan and Blum [2006] Maria-Florina Balcan and Avrim Blum. Approximation algorithms and online mechanisms for item pricing. In Joan Feigenbaum, John C.-I. Chuang, and David M. Pennock, editors, Proceedings 7th ACM Conference on Electronic Commerce (EC-2006), Ann Arbor, Michigan, USA, June 11-15, 2006 , pages 29–35. ACM, 2006. doi: 10.1145/1134707.1134711 . URL https://doi.org/10.1145/1134707.1134711 . · doi ↗
- 6Balcan and Sharma [2021] Maria-Florina Balcan and Dravyansh Sharma. Data driven semi-supervised learning. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, Neur IPS 2021, December 6-14, 2021, virtual , pages 14782–14794, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/7c 93ebe 873ef 213123 c 8af 4b
- 7Balcan et al. [2008] Maria-Florina Balcan, Avrim Blum, Jason D Hartline, and Yishay Mansour. Reducing mechanism design to algorithm design via machine learning. Journal of Computer and System Sciences , 74(8):1245–1270, 2008.
- 8Balcan et al. [2015] Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Commitment without regrets: Online learning in stackelberg security games. In Tim Roughgarden, Michal Feldman, and Michael Schwarz, editors, Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC ’15, Portland, OR, USA, June 15-19, 2015 , pages 61–78. ACM, 2015. doi: 10.1145/2764468.2764478 . URL https://doi.org/10.1145/2764468.2764478 . · doi ↗
