An Empirical Analysis of Optimal Nonlinear Pricing in   Business-to-Business Markets

Soheil Ghili; Russ Yoon

arXiv:2302.11643·econ.GN·August 13, 2024

An Empirical Analysis of Optimal Nonlinear Pricing in Business-to-Business Markets

Soheil Ghili, Russ Yoon

PDF

Open Access

TL;DR

This paper develops an empirical method to estimate optimal nonlinear pricing strategies in B2B markets, demonstrating significant profit gains over linear pricing and analyzing the impact of various market factors.

Contribution

It introduces a novel empirical approach for estimating optimal nonlinear price schedules considering multi-dimensional consumer heterogeneity in B2B markets.

Findings

01

Optimal nonlinear pricing increases profit by at least 8.2% over linear pricing.

02

Second-degree price discrimination recovers 7.1% of the profit gap to first-degree pricing.

03

Demand and cost factors significantly influence the shape of the optimal price schedule.

Abstract

In continuous-choice settings, consumers decide not only on whether to purchase a product, but also on how much to purchase. Thus, firms optimize a full price schedule rather than a single price point. This paper provides a methodology to empirically estimate the optimal schedule under multi-dimensional consumer heterogeneity with a focus on B2B applications. We apply our method to novel data from an educational-services firm that contains purchase-size information not only for deals that materialized, but also for potential deals that eventually failed. We show that this data, combined with identifying assumptions, helps infer how price sensitivity varies with "customer size". Using our estimated model, we show that the optimal second-degree price discrimination (i.e., optimal nonlinear tariff) improves the firm's profit upon linear pricing by at least 8.2%. That said, this…

Tables14

Table 1. Table 1 : Evidence in suggesting that size data for unsuccessful deals is meaningful.

Number of Employees	Count	% Size over 10	over 20	over 50
Unsucessful Deals
1-100	989	9	1	0
101-1000	1,206	27	12	2
1000+	369	22	14	6
Successful Deals
1-100	753	7	1	0
101-1000	1,235	18	7	1
1000+	383	26	10	3

Table 2. Table 2 : Deal success rate by industry. There is a meaningful difference between “Computer Software” and “Marketing and Advertising”.

Industry	Count	Deal Success Rate
Computer Software	1,892	0.53
Marketing and Advertising	218	0.35
All Other	2,885	0.46

Table 3. Table 3 : Customers with low level of behavioral feature 1 are much less likely to purchase and more likely to be of smaller sizes.

Behavioral feature 1 level	Count	Deal Success Rate	% Size over 10	over 20	over 50
Low	3,442	0.35	15	6	1
High	1,553	0.76	24	10	3

Table 4. Table 4 : Maximum Likelihood Estimates for the parameters describing f V | Q ¯ ( ⋅ ) subscript 𝑓 conditional 𝑉 ¯ 𝑄 ⋅ f_{V|\bar{Q}}(\cdot) according to equation 5 . Bootstrapped standard errors are shown in parentheses.

Coefficient		Estimate
Intercept	$β_{0}$	2260.56
		(56.25)
Log Feature 1	$β_{1}$	133.79
		(9.10)
Log Feature 2	$β_{2}$	686.64
		(29.67)
Computer Software	$β_{c s}$	39.18
		(28.71)
Marketing and Advertisement	$β_{m a}$	-224.29
		(66.90)
Log Firm Age	$β_{a g e}$	-40.99
		(16.43)
Time	$α_{2021}$	70.2
		(25.61)
Mid Size	$γ_{m e d i u m}$	-657.40
		(44.05)
Large Size	$γ_{b i g}$	-835.73
		(67.53)
Scale	$σ$	385.44
		(8.46)
Negative
Log-Likelihood		2648.59

Table 5. Table 5 : Profit and Welfare Analysis

					Consumer		Social
Scheme	Revenue	change	Profit	change	Welfare	change	Welfare	change
	($M)	(%)	($M)	(%)	($M)	(%)	($M)	(%)
current	30.24	-	18.07	-	7.41	-	25.48	-
\cdashline2-9[1pt/1pt] 1 $^{st}$ degree	59.84	+97.88%	37.60	+108.08%	0	$-$ 100.00%	37.60	+47.57%
linear	33.56	+10.97%	18.04	$-$ 0.18%	10.52	+ 41.96%	28.56	+12.07%
nonlinear
continuous	34.13	+12.86%	19.02	+ 5.27%	10.51	+ 41.88%	29.53	+15.91%
origin	34.58	+14.36%	19.04	+ 5.35%	11.17	+ 50.72%	30.21	+18.54%

Table 6. Table 6 : Profit and Welfare Analysis by Groups

	Group1 ( $<$ 10)	Group2 (10-19)	Group3 (20-49)	Group4 (50-99)	Group5 ( $\geq$ 100)	Total
Profit
Current	4.53	4.53	4.78	2.39	1.84	18.07
First-Degree PD	10.04	8.12	10.16	4.78	4.49	37.59
Linear	4.34	4.49	5.01	2.40	1.79	18.03
Nonlinear	4.72	4.57	5.24	2.56	1.96	19.05
Consumer Welfare
Current	2.16	1.92	1.69	0.84	0.79	7.40
First-Degree PD	0.00	0.00	0.00	0.00	0.00	0.00
Linear	4.28	2.77	1.97	0.83	0.67	10.52
Nonlinear	3.04	2.44	2.91	1.47	1.30	11.16
Social Welfare
Current	6.69	6.45	6.47	3.23	2.63	25.47
First-Degree PD	10.04	8.12	10.16	4.78	4.49	37.59
Linear	8.62	7.26	6.98	3.23	2.46	28.55
Nonlinear	7.76	7.01	8.15	4.03	3.26	30.21

Table 7. Table 7 : Correlation between deal success and size, (as well as value and size) for each of the three scenarios of Figure 12

Correlation	Left col of Fig 12	Mid col of Fig 12	Right col of Fig 12
$s_{i t}$ & ${\bar{q}}_{i t}$	$- {0.06}^{, *}$	${0.09}^{, *}$	$- {0.14}^{, *}$
${\hat{v}}_{i t}$ & ${\bar{q}}_{i t}$	$- {0.26}^{, *}$	$- {0.01}^{}$	$- {0.41}^{, *}$

Table 8. Table 8: jointly vs individually optimized prices: performance comparison segment by segment. A positive sign in the right-most column means the jointly optimized schedule delivers a higher profit from the respective segment relative to the individually optimized schedule.

Scenario	Customer Size Segment	Profit Difference from the Segment
	small: ${\bar{q}}_{i} < 20$	$-$ $0.16M
Mid col of Fig 12	medium: ${\bar{q}}_{i} \in [20, 50)$	+$1.45M
	large: ${\bar{q}}_{i} > 50$	$-$ $0.01M
	small: ${\bar{q}}_{i} < 20$	+$0.75M
Right col of Fig 12	medium: ${\bar{q}}_{i} \in [20, 50)$	$-$ $0.39M
	large: ${\bar{q}}_{i} > 50$	+$1.33M

Table 9. Table 9 : Optimal linear and nonlinear price schedules, with and without fixed fees

Scheme	Fixed Fee ($)	Marginal Price(s) ($/unit)	Revenue ($M)	Profit ($M)
Linear	-	[2406]	33.56	18.04
Linear + Fixed Fee	1808	[2266]	32.51	19.10
Nonlinear	-	[2669 2518 2143 2039 1931]	34.58	19.04
Nonlinear + Fixed Fee	1770	[2344 2385 2112 2016 1957]	33.25	19.37

Table 10. Table 10 : Profit and Welfare Analysis

					Consumer		Social
Scheme	Revenue	change	Profit	change	Welfare	change	Welfare	change
	($M)	(%)	($M)	(%)	($M)	(%)	($M)	(%)
current	27.45	-	16.13	-	6.69	-	22.81	-
\cdashline2-9[1pt/1pt] 1 $^{st}$ degree	54.20	+97.44%	33.75	+109.24%	0	$-$ 100.00%	33.75	+47.96%
linear	30.98	+12.86%	16.21	+ 0.50%	9.76	+ 45.89%	25.97	+13.85%
nonlinear	31.17	+13.55%	17.23	+ 6.82%	9.24	+ 38.12%	26.47	+16.04%

Table 11. Table 11 : MLE estimates table

Model	(1)	(2)	(3)	(4)	(5)
Constant	2,260	2,314	2,253	2,435	2,531
Size $\in$ [20,50)	-657	-638	-659	-535
Size $\in$ [50, $\infty$ )	-835	-797	-840	-610
Size $\in$ [10,20)					-270
Size $\in$ [20,50)					-583
Size $\in$ [50,100)					-711
Size $\in$ [100, $\infty$ )					-832
log Age (years)	-40	-38	-42	-46	-22
Year (t)	70	118	71	123	47
Industry - Computer Software	39	40	38	44	25
Industry - Marketing & Advertising	-224	-177	-222	-144	-130
log Number of Employees		-8	1	2
log Behavioral feature 1	133	144	133	151	79
Behavioral feature 2	686	680	687		417
Behavioral feature 3		-462		-441
Scale	385	374	386	383	228
Negative Log Likelihood	2649	2557	2649	2865	2613
Number of Obs.	4,468	4,468	4,468	4,468	4,468

Table 12. Table 12 : Optimal nonlinear pricing schedule P ∗ = ( p 1 ∗ , … , p 5 ∗ ) superscript 𝑃 subscript superscript 𝑝 1 … subscript superscript 𝑝 5 P^{*}=(p^{*}_{1},...,p^{*}_{5}) remains robust to model specification.

Model Specification	$p_{1}^{*}$	$p_{2}^{*}$	$p_{3}^{*}$	$p_{4}^{*}$	$p_{5}^{*}$
(1)	2669	2518	2143	2039	1931
(2)	2707	2537	2164	1997	1830
(3)	2673	2517	2143	2039	1929
(4)	2695	2434	2069	1973	2048
(5)	2616	2341	2124	2094	1846

Table 13. Table 13 : Comparative analysis of the performances of different optimization algorithms

Method		Marginal Price ($)	Profit ($M)	Time (sec)
Grid-Bisection	linear	[2392]	18.71	7
	nonlinear 3	[2584 2132 2136]	20.07	276
	nonlinear 5	[2682 2477 2133 2125 2133]	20.19	1243
Bayesian Optimization	linear	[2392]	18.71	25
	nonlinear 3	[2578 2134 2132]	20.06	342
	nonlinear 5	[2696 2456 2098 2079 1810]	20.07	1194
Nelder-Mead 5	linear	[2390]	18.71	3
	nonlinear 3	[2590 2077 2074]	20.08	14
	nonlinear 5	[2694 2456 2072 2034 1835]	20.13	157
Nelder-Mead 10	linear	[2390]	18.71	3
	nonlinear 3	[2585 2133 2130]	20.08	55
	nonlinear 5	[2656 2443 2077 2076 2075]	20.16	3006

Table 14. Table 14 : Profits from Jointly and Individually Optimized Schedules and Third-Degree PD by Size

	Original	(i)	(ii)
Jointly Optimized Pricing	19.04	16.80	23.56
\cdashline1-4[1pt/1pt] Individually Optimized Pricing	19.04	15.51	21.87
	( $-$ 0.00%)	( $-$ 7.68%)	( $-$ 7.17%)
Third-Degree PD (by Size)	19.07	17.31	24.40
	(+0.16%)	(+3.04%)	(+3.57%)

Equations30

V_{i} (q) = v_{i} \times min (q, \overset{q}{ˉ}_{i})

V_{i} (q) = v_{i} \times min (q, \overset{q}{ˉ}_{i})

q_{i} = q^{*} (P ∣ v_{i}, \overset{q}{ˉ}_{i}) : = ar g q \geq 0 max V_{i} (q) - P (q)

q_{i} = q^{*} (P ∣ v_{i}, \overset{q}{ˉ}_{i}) : = ar g q \geq 0 max V_{i} (q) - P (q)

\pi(P)=N\times\int_{v,\bar{q}}P\big{(}q^{*}(P|v,\bar{q})\big{)}-c_{1}\times\textbf{1}_{q^{*}(P|v,\bar{q})>0}-c_{2}\times\big{(}q^{*}(P|v,\bar{q})\big{)}f(v,\bar{q})dvd\bar{q}

\pi(P)=N\times\int_{v,\bar{q}}P\big{(}q^{*}(P|v,\bar{q})\big{)}-c_{1}\times\textbf{1}_{q^{*}(P|v,\bar{q})>0}-c_{2}\times\big{(}q^{*}(P|v,\bar{q})\big{)}f(v,\bar{q})dvd\bar{q}

P^{*} = ar g P (\cdot) max π (P)

P^{*} = ar g P (\cdot) max π (P)

v_{i t} = β \times X_{i} + α_{t} + γ_{\tilde{q}} + ϵ_{i t}

v_{i t} = β \times X_{i} + α_{t} + γ_{\tilde{q}} + ϵ_{i t}

\mathcal{L}(\beta,\alpha,\gamma,\sigma)=\Pi_{it}\text{Prob}\big{(}s_{it}=\textbf{1}_{q^{*}(P|v_{it},\bar{q}_{it})>0}\big{)}

\mathcal{L}(\beta,\alpha,\gamma,\sigma)=\Pi_{it}\text{Prob}\big{(}s_{it}=\textbf{1}_{q^{*}(P|v_{it},\bar{q}_{it})>0}\big{)}

\pi_{k}(p)=N_{k}\times\int_{v,\bar{q}}p\times\big{(}q^{*}(p|v,\bar{q})\big{)}-c_{1}\times\textbf{1}_{q^{*}(p|v,\bar{q})>0}-c_{2}\times\big{(}q^{*}(p|v,\bar{q})\big{)}f(v,\bar{q}|\bar{q}\in I_{k})dvd\bar{q}

\pi_{k}(p)=N_{k}\times\int_{v,\bar{q}}p\times\big{(}q^{*}(p|v,\bar{q})\big{)}-c_{1}\times\textbf{1}_{q^{*}(p|v,\bar{q})>0}-c_{2}\times\big{(}q^{*}(p|v,\bar{q})\big{)}f(v,\bar{q}|\bar{q}\in I_{k})dvd\bar{q}

\tilde{p}_{k} = ar g p \in R max π_{k} (p)

\tilde{p}_{k} = ar g p \in R max π_{k} (p)

g_{\overset{q}{ˉ}, X} (\cdot, \cdot) = g_{\overset{q}{ˉ}, X} (\cdot, \cdot ∣ v_{i} \geq \overset{p}{ˉ}) \times Prob (v_{i} \geq \overset{p}{ˉ}) + g_{\overset{q}{ˉ}, X} (\cdot, \cdot ∣ v_{i} < \overset{p}{ˉ}) \times Prob (v_{i} < \overset{p}{ˉ})

g_{\overset{q}{ˉ}, X} (\cdot, \cdot) = g_{\overset{q}{ˉ}, X} (\cdot, \cdot ∣ v_{i} \geq \overset{p}{ˉ}) \times Prob (v_{i} \geq \overset{p}{ˉ}) + g_{\overset{q}{ˉ}, X} (\cdot, \cdot ∣ v_{i} < \overset{p}{ˉ}) \times Prob (v_{i} < \overset{p}{ˉ})

q^{*}(P|\bar{q},v):=\arg\max_{q\geq 0}\big{[}V(q|\bar{q},v)-P(q)\big{]}

q^{*}(P|\bar{q},v):=\arg\max_{q\geq 0}\big{[}V(q|\bar{q},v)-P(q)\big{]}

q^{*}(P|\bar{q},v):=\arg\max_{q\in[0,\bar{q}]}\big{[}V(q|\bar{q},v)-P(q)\big{]}

q^{*}(P|\bar{q},v):=\arg\max_{q\in[0,\bar{q}]}\big{[}V(q|\bar{q},v)-P(q)\big{]}

q^{*}(P|\bar{q},v):=\arg\max_{q\in[0,\bar{q}]}\big{[}q\times v-P(q)\big{]}

q^{*}(P|\bar{q},v):=\arg\max_{q\in[0,\bar{q}]}\big{[}q\times v-P(q)\big{]}

V_{i} (q) \equiv v_{i} \times [ζ min (q, \overset{q}{ˉ}_{i})^{α}]

V_{i} (q) \equiv v_{i} \times [ζ min (q, \overset{q}{ˉ}_{i})^{α}]

v_{i} \times ζ \overset{q}{ˉ}_{i}^{α} = v_{i} \overset{q}{ˉ}_{i} \Leftrightarrow ζ = \overset{q}{ˉ}_{i}^{1 - α}

v_{i} \times ζ \overset{q}{ˉ}_{i}^{α} = v_{i} \overset{q}{ˉ}_{i} \Leftrightarrow ζ = \overset{q}{ˉ}_{i}^{1 - α}

(\underline{p}_{k}^{t + 1}, \overset{p}{ˉ}_{k}^{t + 1}) = ⎩ ⎨ ⎧ (\underline{p}_{k}^{t}, \underline{p}_{k}^{t} + l_{k}^{t}), (\overset{p}{ˉ}_{k}^{t} - l_{k}^{t}, \overset{p}{ˉ}_{k}^{t}), (p_{k}^{t *} - z \frac{l _{k}^{t}}{2}, p_{k}^{t *} + z \frac{l _{k}^{t}}{2}), if p_{k}^{t *} - z \frac{l _{k}^{t}}{2} < \underline{p}_{k}^{t + 1} if p_{k}^{t *} + z \frac{l _{k}^{t}}{2} > \overset{p}{ˉ}_{k}^{t + 1} otherwise

(\underline{p}_{k}^{t + 1}, \overset{p}{ˉ}_{k}^{t + 1}) = ⎩ ⎨ ⎧ (\underline{p}_{k}^{t}, \underline{p}_{k}^{t} + l_{k}^{t}), (\overset{p}{ˉ}_{k}^{t} - l_{k}^{t}, \overset{p}{ˉ}_{k}^{t}), (p_{k}^{t *} - z \frac{l _{k}^{t}}{2}, p_{k}^{t *} + z \frac{l _{k}^{t}}{2}), if p_{k}^{t *} - z \frac{l _{k}^{t}}{2} < \underline{p}_{k}^{t + 1} if p_{k}^{t *} + z \frac{l _{k}^{t}}{2} > \overset{p}{ˉ}_{k}^{t + 1} otherwise

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConsumer Market Behavior and Pricing · Supply Chain and Inventory Management · Merger and Competition Analysis

Full text

An Empirical Analysis of Optimal Nonlinear Pricing††thanks: We thank Saman Ghili for his advice on optimization methods and literature. We also thank Jason Abaluck, Tat Chan, Sam Goldberg, Yufeng Huang, Yewon Kim, Natalia Kyui, Tesary Lin, Nikhil Malik, Dan Miller, K. Sudhir, Raphael Thomadsen, Caio Waisman, Wenting Yu, Jidong Zhou, and various conference and seminar participants for their helpful comments. Ghili acknowledges financial support from the Yale Center for Customer Insights. We thank Wanxi Zhou for outstanding research assistance. Click here for the most current version. All errors are our own.

Soheil Ghili, Russ Yoon Yale University. Email: [email protected] Institute of Technology. Email: [email protected]

Abstract

In continuous-choice settings, consumers decide not only on whether to purchase a product, but also on how much to purchase. Thus, firms optimize a full price schedule rather than a single price point. This paper provides a methodology to empirically estimate the optimal schedule under multi-dimensional consumer heterogeneity. We apply our method to novel data from an educational-services firm that contains purchase-size information not only for deals that materialized, but also for potential deals that eventually failed. We show that this data, combined with identifying assumptions, helps infer how price sensitivity varies with “customer size”. Using our estimated model, we show that the optimal second-degree price discrimination (i.e., optimal nonlinear tariff) improves the firm’s profit upon linear pricing by at least 5.5%. That said, this second-degree price discrimination scheme only recovers 5.1% of the gap between the profitability of linear pricing and that of infeasible first degree price discrimination. We also conduct several further counterfactual analyses (i) empirically quantifying the magnitude by which incentive-compatibility constraints impact the optimal pricing and profits, (ii) comparing the role of demand- v.s. cost-side factors in shaping the optimal price schedule, and (iii) studying the implications of fixed fees for the optimal contract and profitability.

1 Introduction

“Continuous choice” products are increasingly common. For such products (unlike in discrete choice environments), each consumer decides not only on whether to purchase, but also on how much. Traditional B2C examples of such products are cell-phone plans and utility. More recently, many B2B products (e.g., cloud services, SaaS products, etc) are of this form. In continuous choice settings, the firm’s pricing problem goes beyond finding an optimal price point; it entails optimizing a full pricing schedule. The objective of this paper is to provide an empirical framework for optimizing a nonlinear price schedule. In developing this framework, we contribute to the literature on multiple fronts: (i) the model, (ii) the data, and (iii) the analyses run and lessons learned.

Our model is one of the continuous-choice demand models with multi-dimensional heterogeneity across customers. It is built to capture a key insight from the economic theory literature on multi-dimensional screening. More specifically, we develop a demand model that allows for flexibility in the joint distribution of a customer’s “size of use” (i.e., how many units of the product the customer needs) and her price sensitivity. The correlation across customers between these two quantities (or more broadly, the joint distribution of them) is critical for determining the shape of the optimal nonlinear pricing scheme. To illustrate, if “larger” customers are more price sensitive, then flatter tariffs are more profitable whereas if “smaller” ones have a higher sensitivity to price, then steeper schedules are recommended. This is closely related to results from recent multi-dimensional-screening/2nd-dgree-disrimination literature on how the optimal tariff is impacted by the shape of the joint distribution between price sensitivity and taste/need for quality/quantity (Anderson and Dana Jr, 2009; Haghpanah and Hartline, 2021; Ghili, 2023; Yang, 2021). Nevertheless, we are not aware of empirical work that directly models customer size of use in the context of optimal nonlinear pricing.

Ideally, empirical estimation of the relationship between customer size-of-use (which we henceforth refer to as “size”) and customer price sensitivity requires exogenous price changes to which a statistically representative subset of customers are exposed. Such variation would be useful because it would allow the researcher to compare reactions to the price change across customers with different pre-price-change purchase sizes. Nevertheless, such price variations are typically unavailable. This is especially common for most B2B products which tend to have moderate data sizes (hundred or thousands of customers in total) but large revenue per customer, which can render varying the prices risky.111A prime example of such data issues is cloud computing where the total number of clients is rather small, price schedules are fixed for years, and firms are unwilling/unable to run pricing experiments.

To provide a broadly applicable solution to this problem, we turn to novel data which, although unused in academic work to our knowledge, is increasingly collected and maintained by firms (especially in B2B). More specifically, we leverage a dataset from a B2B firm that sells educational workshops and that records not only deals that succeeded, but also sale efforts that started but did not lead to a transaction. Crucially, for unsuccessful sale efforts, the data records, among other things, the would-be number of workshops that the potential customer was considering buying.

Combined with the right identifying assumptions, such a dataset allows to identify the key object of interest in estimation: the joint distribution over customer size and price sensitivity. What makes it possible to flexibly recover this distribution is, roughly, the variation across deal sizes in deal success rate. To informally illustrate, if a significantly higher fraction of larger potential deals fail relative to smaller potential deals, then large customers are on average more price sensitive than smaller customers. More complex joint distributions may also be recovered if, for instance, medium-sized potential deals are more likely to be unsuccessful compared to both larger and smaller ones. The key feature of our data (i.e., observing the intended sizes of eventually unsuccessful deals) gives us access to this necessary variation in deal success rates across sizes, thereby enabling us to estimate the joint distribution of interest.

We use our estimation and price-schedule optimization methodology to simulate the optimal nonlinear schedule (i.e., optimal 2nd degree price discrimination) and assess its effects on profit and welfare. We find that the optimal schedule lowers the per-unit prices for larger deal sizes, relative to those for smaller deals. The cost-side reason for this is that a portion of production costs is incurred per customer rather than per unit. The demand side reasons are more complex and will be discussed in detail later in the paper; but a simplified intuition is that a higher portion of larger deals fail relative to smaller ones, even though the observed pricing strategy of the firm already involved some volume discounting. To give a sense of the magnitude of our results, we find that it is best to offer about 28% lower per-unit fees to customers purchasing more than 100 units relative to those who purchase less than five. The discount is around 19% for deals of sizes 10-19 relative to those less than five. The optimal nonlinear pricing schedule delivers at least a 5.5% higher profit relative to optimal linear pricing. In addition, we find that the optimal second-degree schedule covers 5.1% of the profitability-gap between optimal linear pricing and optimal first-degree price discrimination. This stands in contrast to third-degree price discrimination which has been shown to closely approximate the profitability of first degree discrimination (see Dubé and Misra (2017)). Also, we find that the optimal second degree price discrimination leads to an almost 6.2% increase in consumer welfare relative to linear pricing (consumer welfare for first-degree discrimination is, by construction, zero). In addition, nonlinear pricing (i.e., second degree price discrimination) increases total social welfare relative to linear pricing by about 5.8%.

We conduct several additional counterfactual analyses that both produce useful insights and demonstrate the ability of our framework to generate similar insights in other settings: (i) We study the role of “incentive compatibility constraints”. More specifically, we quantify the profit from a price schedule in which the per-unit price for each deal-size range is optimized only considering the per-unit willingnesses to pay by customers whose sizes of use are in that range. We show that such “separately optimized” price schedules lead to significant losses if they charge for a deal-size range a substantially different marginal price compared to its adjacent ranges. Such losses arise from the fact that customers belonging to a size-group that faces a high marginal price are incentivized to adjust their purchase sizes to take advantage of substantially lower per-unit prices offered to other size groups. We show that the “globally optimal” contract that accounts for such incentives tends to moderate the variation of the per-unit prices across sizes. (ii) We empirically compare the role of cost side factors to those of demand side ones in determining the shape of the optimal price schedule. We show that the nonlinearities in the optimal schedule are more heavily shaped by the heterogeneity in demand than by nonlinearity in the cost function. We then document the interaction between demand- and cost-side factors. In addition, we dive deeper into the role of costs. Among other things, we show that “customer-level” fixed costs, which are a phenomenon specific to continuous-choice settings, have key implications for optimal pricing–unlike traditional firm-level fixed costs. (iii) We examine the effects of using a fixed fee on the shape and profitability of the optimal contract.

Finally, we provide an alternative estimation method that would utilize experimental price data and would not rely on data on sizes of unsuccessful deals. Although in most B2B settings and for continuous-choice products, statistically powerful experiments tend to be too expensive, we still believe outlining this econometric approach would be useful in case such opportunity does arise. In presenting this alternative estimation method, we first generate simulated purchase data under price experiments, based on our results from the main analysis. We then propose a method to recover the joint distribution of customer sizes and values using this data but without relying on sizes of unsuccessful deals. We further discuss the advantages and disadvantages of the experimental approach relative to our main proposal of using data on intended sizes of failed deals.

The rest of this paper is organized as follows. Section 2 reviews the related literature. Section 3 describes the data and setting and provides some summary statistics. Section 4 presents the model. Section 5 discusses the estimation procedure, identification, and estimation results. Section 6 presents the optimal nonlinear price schedule and discusses in detail how it compares to a linear schedule and to first-degree price discrimination. Section 7 conducts further counterfactual analysis. 8 presents an alternative estimation approach, discusses the general applicability of our method beyond our specific setting, and points to some caveats and avenues for future research. Section 9 concludes.

2 Related Literature

This paper sits at the intersection of two strands of the literature on theory and empirics of nonlinear pricing. On the theoretical end, there exists a large literature on “screening,” with one of its major applications/interpretations being nonlinear pricing. Many papers in this domain, such as seminal work by Mussa and Rosen (1978) and Maskin and Riley (1984), focus on uni-dimensional consumer type spaces. The literature on multi-dimensional screening (e.g., Wilson (1993); Armstrong (1996); Laffont et al. (1987); Rochet and Stole (2002, 2003); Carroll (2017)) relaxes this assumption. The present paper relates to this literature by directly analyzing optimal screening and second-degree price discrimination in an empirical setting. We empirically optimize a full price schedule for a firm that, when deciding the charge amount for a given quantity/quality level $q$ , tries not only to maximize the profit made from those customers who purchase at $q$ , but also to incentivize as many customers as possible to purchase more profitable quantities $q^{\prime}$ .

Though it may not be clear in the first glance, our emphasis on correlation between size and value makes this paper related to the literature on bundling of products with non-additive values. In that literature, there are results stating that bundling a set of products is recommended if less price-sensitive consumers tend to perceive a lower degree of complementarity among products (see Anderson and Dana Jr (2009); Haghpanah and Hartline (2021); Ghili (2023) among others). This has parallels to our intuition that flatter contracts are optimal when less-price-sensitive customers tend to be the smaller ones.

On the empirical side, there have been a number of studies on nonlinear tariffs. Most commonly studied applications have been electricity and cell-phone plans. Some of these papers examine a one-dimensional type space (Luo et al., 2018; Aryal and Gabrielli, 2020)222Aryal and Gabrielli (2020) examine a two-dimensional heterogeneity model. But they study two products; and preferences over each of the products has been captured using a single dimension of heterogeneity. whereas others study mutli-dimensional type spaces (e.g., Nevo et al. (2016); Reiss and White (2005); McManus (2007)). We have three points of differentiation relative to these studies. First, we are explicitly after capturing “customer size” as one dimension along which there can be heterogeneity and motivate its relevance to optimal nonlinear pricing. The literature does not directly study this. The closest paper to us on this front is Reiss and White (2005) which studies nonlinear pricing in the electricity market and examines heterogeneity across consumers in the number and types of electric home-appliances they own. Second, and relatedly, we bring novel data on the would-be sizes of unsuccessful deals which plays a key role in being able to estimate the model. Finally, this literature tends to take the regulator’s perspective and study how a number of various nonlinear pricing schemes fare against each other in terms of consumer welfare (see Nevo et al. (2016) for instance).333Some studies, such as Luo et al. (2018), even assume that the observed pricing scheme is optimal and use its optimality condition as a supply-side moment to back out marginal costs. In contrast, this paper is one of the few empirical studies of nonlinear pricing to take the firm’s perspective and provide a method to optimize the nonlinear tariff. Other examples of firm-level studies are Narayanan et al. (2007); Iyengar et al. (2008); Iyengar and Jedidi (2012); Bodoh-Creed et al. (2023).

Bodoh-Creed et al. (2023) is closely related to our paper in that it empirically optimizes a full nonlinear pricing schedule. Following Luo et al. (2018), their main specification adopts a “multiplicatively separable” utility function which is well-suited for analyzing intensive margins and also allows for the computation of the optimal schedule using techniques from the theory literature. They also derive econometric bounds on the shape of the schedule. The focus of our model is on the relationship between size-of-use and price sensitivity which cannot be captured with multiplicatively separable models444All multiplicatively separable models imply that “larger consumers” are always the higher willingness-to-pay ones. Also, our empirical strategy based on intended sizes of failed deals makes our method especially suited to B2B cases where such data is available but price variation is rare/expensive.

Our work is also related to the empirical literature that quantifies the efficiency of price discrimination strategies. For example, Dubé and Misra (2017) quantify that a sufficiently fine-grained third-degree price discrimination strategy can replicate the profitability of first degree price discrimination in the context of their study. Our paper is complementary in that it quantifies the same effect for second degree price discrimination, and is one of the few to do so, especially in the context of nonlinear pricing (for other examples of empirical quantification of the effects of second-degree discrimination–or more generally, screening mechanisms–see Hendel and Nevo (2013); Goldberg (2021); Leslie (2004); Verboven (2002); Iyengar and Gupta (2009); Kadiyali et al. (1996); Draganska and Jain (2006) where the latter two papers consider competing sellers. For surveys of this literature, see Chan et al. (2009) and Lambrecht et al. (2012). We show that at least in our context, second degree price discrimination recovers only a small portion of the profitability gap between first degree price discrimination and no discrimination at all.

Finally, this paper is related to the literature on continuous-choice demand models (see, among others, Dubé (2004); Hendel (1999); Kim et al. (2002); Chan (2006); Song and Chintagunta (2007)). The key objective of this literature is to examine environments where multiple brands are available and consumers might choose to purchase a positive amount from each brand and form a mixed basket. Consequently, demand models in this literature are more general than ours on the fronts related to this objective, such as the inclusion of multiple products in the utility function. Our paper, on the other hand, seeks to optimize a nonlinear tariff for second-degree price discrimination. As such, we move away from the “multiplicatively separable” utility functions assumed in this literature to one that allows for heterogeneity in the two key dimensions of size of use and price sensitivity. We then focus on leveraging data to identify the joint distribution on these two dimensions, and use our estimated model to optimize the nonlinear tariff.

3 Data, Setting, and Descriptive Statistics

We study pricing by LifeLabs Learning, a New-York-based HR-Services company offering workshops to employees of its business clients. Lifelabs serves customers within and outside of the United States. The workshops are on leadership and other business-related skills. The time window of our data from the company encompasses 2020 and 2021. As of 2021, LifeLabs did not directly approach potential customers and its marketing activities were mainly based on word of mouth. When a potential client reaches out to LifeLabs, a conversation about a potential deal begins. The most important aspects of each deal in our data are quantity (the number of workshops to be delivered by LifeLabs to the company) and the total price. The price is determined based on a pre-set schedule as a function of quantity. Figure 1 depicts LifeLabs’ current price schedule.

In addition to the price schedule, our data consists of demand- and cost-side information, which we turn to subsequently.

Demand side data: For each potential deal, aside from the quantity and the total price, we also observe whether the deal eventually succeeded. In other words, our data provides information on price and quantity not only for actual transactions, but also for potential ones that did not actualize. As we will discuss later on, this is key to identifying how per-workshop valuation across firms correlates with the sizes of their needs.

Figure 2 plots deal success rate against deal size. As can be seen from this figure, a higher percentage of deals fail as we look at larger sizes; and this is in spite of the fact that for those deal sizes the per-workshop price is cheaper. This is suggestive that LifeLabs might want to further sharpen its volume discount policy in order to increase profitability. Of course a structural analysis encompassing both demand- and cost-side data would be necessary before one can (i) determine whether such a strategy is indeed recommended to LifeLabs and (ii) quantify the extent of it.

Figure 3 presents three histograms. The left panel shows the counts of deals of different size groups. The middle and right panels respectively depict the total revenue and total profit (in $M) from each such size group. As these panels together depict, larger deals are substantially less frequent than smaller ones. Nevertheless, they contribute meaningfully to the total firm revenue and especially profit. This is because each large deal contributes a more substantial revenue compared to a small deal; and the contrast is even sharper (due to the role of costs) when we compare the profits instead of the revenues.

In addition to the deals data, we also possess a client-level dataset which provides information on different client characteristics. Among those are number of employees, a coarse measure of annual revenue ( $<\$ 1M $,$ $1-10M $,$ $10-50M $,$ $50-100M $,$ $100-1,000M $) geographical location (country if abroad, city if in the U.S. or Canada), industry, year founded, and some behavioral characteristics that we may not publicly disclose. Figure [4](#S3.F4) provide summary statistics on deals and customer characteristics. Customers come from various industries with “Computer Software” being the most common. Although most customers are U.S. based, there is a non-trivial international demand. Finally, customers are mostly companies with fewer than 10,000 employees,$ 50M or less in annual revenue, and founded after 1950.

Customer characteristics data is also helpful in providing assurance that our data on unsuccessful deals is meaningful (such assurance is important because one might be concerned that the information recorded for deals that eventually did not happen might be too noisy to be useful). To this end, we briefly report some summaries that examines how (intended) deal size and deal success status fall into clear patterns as they relate to some observables. As Table 1 shows, and as expected, the percentage of deals that are of size 10 or larger increases with company size. Same holds for deals of size 20 or larger and 50 or larger. Crucially, this pattern holds not only for the sizes of successful deals but also for the intended sizes of unsuccessful ones. For instance, only 1% of successful deals are of size 20 or larger for clients with 1-100 employees. This share increases to 7% when we consider clients with 101-1000 employees, and to 10% for those with 1000+ employees. The same shares for unsuccessful deals are 1%, 12%, and 14%, an overall similar magnitude and pattern when compared with the successful deals.555Note that the patterns are not exactly the same. And they need not be. In fact our identification of how price sensitivity varies with size comes from the differences between these two patterns, as will be explained later in the paper. Nevertheless, there is sufficient similarity to strongly suggest our data on unsuccessful deals is meaningful. The fact that this observable has such a clear relationship not only with sizes of successful deals but also with those of unsuccessful ones is suggestive that the size information recorded for deals that eventually fail is meaningful.

Table 2 briefly compares deal success rates between customers belonging to the “Computer Software” industry category and those belonging to “Marketing and Advertising”. The table shows that a potential deal with the firm from the former group is 53% (18pp) more likely to be successful relative to the latter (although not shown in the table, this gap is statistically significant). Again, this relationship between deal success status and observables is suggestive that data on unsuccessful deals were systematically collected.

The final table we present is based on a behavioral characteristic of the customers, the nature of which we may not disclose. We term it “behavioral feature 1”. As Table 3 shows, customers with low level of behavioral feature 1 are much less likely to purchase and more likely to be of smaller sizes. This, again, is suggestive of the meaningfulness of our data on unsuccessful deals.

Costs Data. In addition to purchase data, we were provided with unique and detailed costs data. Using this data, we are able to obtain measures of per-workshop cost to the firm and a “per-customer fixed cost.”666We do not disclose further details on cost in the interest of LifeLabs’ privacy. But we do provide and work with our two aggregate cost measures. The former is the standard marginal cost. The latter is incurred for every customer purchasing a positive amount, but does not change with the amount purchased. This type of fixed cost is specific to continuous choice settings. While to our knowledge this component of cost has not been examined in the continuous-choice literature, we empirically show that it is relevant to optimal nonlinear pricing.

4 Model

We seek to construct a model that allows us to capture the notion central to our problem: a flexible joint distribution over customers’ “size” (i.e., how many units of the product they need) on the one hand and their willingness to pay per unit on the other. Below, we describe different ingredients of this model.

Consumer Preferences. Each potential consumer is modeled using a value function $V_{i}(\cdot)$ where $V_{i}(q)$ is the willingness to pay for $q$ units of the product. Our model of $V_{i}(\cdot)$ has two parameters: “size” $\bar{q}_{i}$ captures how many workshops customer $i$ needs at most, and “valuation” $v_{i}$ denotes the customer’s willingness to pay for each workshop. Formally, customer $i$ ’s willingness to pay for $q$ workshops is given by:

[TABLE]

In words, the customer values each workshop at $v_{i}$ until it has received $\bar{q}_{i}$ ones, at which point it will no longer value additional workshops. This formulation has been used before in the literature to model demand for similar products to what we are considering, such as cloud computing services (Devanur et al., 2020).

Market. A “market” is a collection of individual potential customers $i$ each described using the two parameters $\bar{q}_{i}$ and $v_{i}$ . Formally, one can model it using a scalar $I$ which represents the total number of potential customers as well as a joint distribution $f(\cdot,\cdot)$ over $\bar{q}_{i}$ and $v_{i}$ .

In spite of its parsimony, our formulation for value functions $V_{i}(\cdot)$ has the advantage that it captures exactly the two dimensions (size and value) along which consumer heterogeneity, as captured by $f(\cdot,\cdot)$ is of first-order importance for nonlinear pricing. Figure 5 should help illustrate this matter. This figure uses two simple examples to show how our parsimonious value function captures a key economic force. Each example describes a market with only two customers $i=1,2$ . As the figure depicts, when the customer with a larger size $\bar{q}_{i}$ has a lower valuation $v_{i}$ , a concave (i.e., flattening) contract is optimal. On the other hand, when the larger customer has a higher per-unit valuation, then a convex (steepening) schedule is the best (for a robustness analysis to smooth–as opposed to piece-wise linear–value functions, see appendix C). To sum up, Figure 5 suggests that the relation between $\bar{q}_{i}$ and $v_{i}$ across customers $i$ , formally captured by joint distribution $f(\cdot,\cdot)$ , is critical for optimal nonlinear pricing. The importance of the relationship between a “baseline willingness-to-pay” and “preference for quality/quantity” is congruent with theoretical research on price discrimination, market design, and multi-dimensional screening. Instances of such papers are Anderson and Dana Jr (2009); Haghpanah and Hartline (2021); Ghili (2023); Yang (2021).777This theoretical emphasis on the importance of the relationship between these two dimensions stands in contrast to a large body of the multi-dimensional screening literature, such as Rochet and Stole (2002), that assumes independence across dimensions.

Note that either scenario described in Figure 5 is empirically plausible. On the one hand, a customer with larger need for the product may have a higher willingness-to-pay per unit, perhaps because this customer is more resourceful (e.g., in B2B, it is a larger firm), or perhaps because this product a more essential to the customer. On the other hand, a larger user of a product may have better information about/access to alternative options, lowering its willingness-to-pay per unit for the current option. This empirical ambiguity about $f(\cdot,\cdot)$ ex-ante, alongside its theoretical importance for the shape of the optimal schedule, strongly motivates the need for its empirical estimation. For examples of other empirical papers emphasizing the importance of estimating joint distribution of multi-dimensional heterogeneity for market design, see Nevo et al. (2016); Derdenger and Kumar (2013); and for a paper emphasizing the importance of the flexibility of similar joint-distribution estimations, see Goldberg (2021).

As a result, unlike with the form of the value functions $V_{i}(\cdot)$ , we will not be parsimonious about the modeling of $f(\cdot,\cdot)$ . We will, rather, estimate this object flexibly. This flexibility, as will be demonstrated in next sections, will allow us to consider cases that are relevant to the design of the optimal pricing schedule but are more complex than a simple “positive or negative correlation” between size $\bar{q}$ and value $v$ . An example is a scenario in which “mid-size” customers have on average lower valuations $v_{i}$ compared to both smaller and larger customers.

As a final note, observe that Figure 5 also demonstrates how our model relaxes a restrictive assumption in the literature: As the bottom-right panel shows, $V_{1}(\cdot)-V_{2}(\cdot)$ is non-monotonic in $q$ and crosses zero twice. This relaxes the “single-crossing” assumption in much of the theory on nonlinear pricing (e.g., Maskin and Riley (1984); Mussa and Rosen (1978) and a long line of uni-dimensional-screening literature following them) and some of the empirical studies (e.g., Luo et al. (2018); McManus (2007)).

Nonlinear price schedule. Denote the nonlinear tariff using function $P(q)$ . This simply means any customer who purchases $q$ units of the product will have to pay the total amount of $P(q)$ dollars.

Timeline. We assume a simple timeline. In stage 1, all $N$ potential customers learn about their $\bar{q}_{i}$ but not $v_{i}$ . In stage 2, all potential customers engage in costless exchange with the seller to learn about the product. At the end of this stage, each customer $i$ learns its $v_{i}$ and the firm learns $\bar{q}_{i}$ . In stage 3, potential customers make decisions on how many units of the product (if any) to purchase.

Figure 6 depicts the model timeline graphically. Note that the sequential revelation of $\bar{q}_{i}$ and $v_{i}$ to the buyer does not have a central role in how the purchase decisions are made: the buyer has already learned both of these parameters before stage 3 where it makes the purchase decision. The purpose of setting up this timeline is, hence, not to model incomplete or asymmetric information. But rather, the purpose is to construct the most succinct possible micro-foundation that can explain why those customers $i$ who do not end up purchasing take the time to communicate their $\bar{q}_{i}$ to the seller. The simple answer provided by this model is that it is only at the end of this very communication that they learn their $v_{i}$ is too low to justify any purchase. As a result, Figure 6 should be considered more a “data timeline” than a “model timeline.”

To sum up, the importance of the timeline in Figure 6 is that it provides a framework that allows to interpret and structurally analyze datasets that record deal-size information for unsuccesful deals. This simple setup is also helpful in clarifying the directions in which the model can be extended, a discussion of which is provided in section 8.

Customer purchase decisions and firm profit. We now turn to quantifying the purchase decisions. Each consumer’s net value for $q$ workshops will be given by: $V_{i}(q)-P(q)$ . Thus, customer $i$ ’s purchase decision $q_{i}$ will solve the following optimization problem:

[TABLE]

Seller profit under pricing strategy $P(\cdot)$ is given by:

[TABLE]

In words, the profit is given by revenue net of costs, integrated over customer types. Note that in this continuous-choice setting, the cost function is more complex than it would be under discrete choice. In particular, the cost has two components: $c_{1}$ is the fixed customer-level component of costs which would be incurred once any customer decides to buy a positive amount. Such fixed costs are specific to continuous choice environments and, unlike “sunk” firm-level888Here, by “firm”, we mean the seller. Even though customers are also firms in this B2B context, we refer to them simply as customers. fixed costs, can shape a firm’s optimal pricing. The marginal cost $c_{2}$ would be incurred with every additional workshop provided to a customer.

The firm’s problem is to find the price schedule $P(\cdot)$ to maximize profit:

[TABLE]

With the model fully specified, we next turn to the estimation procedure and identification.

5 Estimation

The object we seek to estimate is the joint probability distribution $f(\cdot,\cdot)$ over values $v$ and sizes $\bar{q}$ . We do this in two steps. We start by estimating the marginal distribution $f_{\bar{Q}}(\cdot)$ for size $\bar{q}$ , and then move on to estimate the conditional distribution $f_{V|\bar{Q}}(\cdot)$ over values $v$ .

For the first step, we take advantage of a feature of a model, alongside an approximation.

Lemma 1.

If price schedule $P(\cdot)$ is strictly increasing and concave, then for any customer type $(v,\bar{q})$ , we have $q^{*}(P|v,\bar{q})\in\{0,\bar{q}\}$ .

The proof of this lemma is relegated to the appendix. This lemma simply says that under a weakly concave price schedule, any customer will either buy no workshops or exactly as much as its size $\bar{q}$ .

Due to gradually decreasing marginal prices (as depicted by figure 1), we treat the observed price schedule in our data as approximately concave. As a result, we assume, for each customer $i$ who purchased $q_{i}$ units, that $q_{i}=\bar{q}_{i}$ . Also, for those who did not purchase, we assume the $q_{i}$ size in our data which they were considering purchasing was indeed their size $\bar{q}_{i}$ . To sum up, we estimate the marginal distribution $f_{\bar{Q}}(\cdot)$ by equating it with the distribution of observed $q^{*}$ amounts (across both successful and unsuccessful deals).

In the appendix, we perform a robustness check to our concavity approximation by removing the few data points for which the observed $q_{i}$ cannot be equal to $\bar{q}_{i}$ (due to the discontinuities in the schedule). The results change only negligibly. Also note that the marginal distribution $f_{\bar{Q}}(\cdot)$ can still be estimated without a concave (or approximately concave) price schedule, though in that case some parametric assumptions may be necessary.

With our estimation for $f_{\bar{Q}}(\cdot)$ at hand, we next turn to estimating $f_{V|\bar{Q}}(\cdot)$ . We use the following model:

[TABLE]

In this equation, $i$ denotes the customer and $t$ represents the year. Also, $X_{i}$ captures observable customer characteristics, $\beta$ is a vector of coefficients determining the weights of different customer characteristics, and $\alpha_{t}$ represents yearly fixed effects. Additionally, in order to directly relate $\bar{q}$ to $v$ beyond what could be explained by observables, we allow “size-group” fixed effects $\gamma_{\tilde{q}}$ . Here, $\tilde{q}$ (which is simplified notation for $\tilde{q}_{it}$ ) indicates which of the three intervals which of the three intervals $[1,20),[20,50)$ or $[50,\infty)$ customer size $\bar{q}_{it}$ falls into. These size-group fixed effects have a critical role in our model’s ability to capture the relationship between $\bar{q}_{i}$ and $v_{i}$ across customers $i$ . Finally, $\epsilon_{it}\overset{iid}{\sim}logistic(0,\sigma)$ . That is, $\epsilon$ is the mean-zero error term with a logistic distribution. Note that we do not normalize the standard deviation of $\epsilon_{it}$ to 1. This is because the price coefficient in the customers’ net value formula has already been normalized to -1.999The reason why we choose this less standard normalization is that we would like our $v_{i}$ values to be measured in dollar terms.

We estimate the model in equation 5 using a MLE approach. The likelihood function is as follows:

[TABLE]

where $s_{it}$ is the observable binary variable denoting whether deal $it$ was successful. This likelihood function gives the probability that the predicted deal success matches with the observed one.

We do not estimate this likelihood function on the original dataset. Rather, we augment the dataset in the following way. We create a copy of the original dataset and make the following modifications to it for each row $it$ : first, we set $p_{it}=0$ . Second, we set $s_{it}=1$ . Then, we concatenate the original dataset with this copy. The purpose of this data-augmentation step is to bring in, as an extra moment, the assumption that almost all potential customers in our record would have purchased if the prices were sufficiently small. As we will argue, this is key to identifying price effects in our model.

Identification. Overall, the entire variation in the data identifies all of the variables of interest. But an informal description of our intuition for what mainly identifies what would still be useful. The vector $\beta$ is identified by the variation in firm characteristics. Year fixed effects are identified by the variation in deal success across years for similarly looking firms negotiating similarly sized deals. Size bucket fixed effects are identified by the differential rates of deal success across different deal sizes. The standard deviation $\sigma$ is identified by choosing the value that, for each size group, would (i) match the predicted demand levels by the model under zero price to the observed volume of potential demand, and (ii) match the predicted demand levels by the model under the observed price schedule to the observed realized demand. If $\sigma$ is too large, then predicted deal success rates across all customer sizes and all prices will be close to 0.5, which is punished by the likelihood function. For $\sigma$ values that are too small to also be punished, some model assumptions help. One key assumption that assists our identification is that we do not interact the size-group fixed effects with year fixed effects. As a result, any change in relative deal-success rates across size groups from 2020 to 2021 could only be explained by $\sigma$ . Another aspect of the model that would punish $\sigma$ levels that are too small is the fact that our size-group fixed effects are coarser than the number of different marginal prices (3 v.s. 5) in the observed schedule. Thus, within some of these size bucket there is some variation in price. If $\sigma$ is too small, the price variation within the size bucket will lead to a prediction that the subset of the size bucket facing the lower marginal price will purchase almost certainly and the other subset with a vanishingly small probability. This would not match the data and gets punished. Note that although our model in equation 5 includes both of these assumptions (no interaction between size-group and year fixed effects, and the size fixed effects being coarser than the price schedule), either one of them is sufficient to ensure identification. For robustness of the analysis to having five instead of three size fixed effects, see Appendix D.

We finish this section by a focused discussion of how price effects are identified. Note that although we have normalized the price coefficient in the model to -1, notions such as price elasticity are still meaningful and are governed by a combination multiple parameters. The key price variation in our setting that helps identify the price effects (and do so by size group) arises from two things (i) observed “intended” sizes for unsuccessful deals and (ii) an identifying assumption. More specifically, our estimation procedure interprets the total number of potential customers (i.e., customers regardless of deal-success status) for each size-group as the demand-volume to match under negligible prices. The estimation procedure also attempts to match observed demand volumes (this time deals that indeed successfully closed) for each size group at observed prices. This effectively creates demand variation (from the volume of all potential deals to that of successful deals) in response to a price variation (from zero to the observed prices) that is used for identification. As we discussed before, this idea was incorporated into the estimation by augmenting the original dataset with a copy of it in which we set $p_{it}=0$ and $s_{it}=1$ .

The above argument is equivalent to to interpreting the percent success rate for each deal size group as the “semi elasticity of demand” for customers of that size when price per-unit is moved from zero to the observed price for that size. It is this interpretation that identifies price effects by size-group in our model. Of course, we have only two price points (zero and the observed level) for each size group. As a result, the identification of the demand function requires parametric assumptions. This is one reason why we have the formulation in equation 5 for values $v_{i}$ , unlike the fully non-parametric estimation that we had for the distribution of $\bar{q}_{i}$ .

5.1 Estimation Results

There are both cost parameters and demand parameters in our model. This section describes how we estimate the demand-side parameters using the procedure described before and how we directly calibrate the cost-side ones based on data from the company.

5.1.1 Demand Side Parameters

As mentioned before, the objective here is to estimate the joint distribution $f(\cdot,\cdot)$ over $\bar{q}$ and $v$ as flexibly as possible. We directly estimate $f_{\bar{Q}}(\cdot)$ directly off of the data and estimate $f_{V|\bar{Q}}(\cdot)$ by finding the parameter values $(\beta,\alpha,\gamma,\sigma)$ in equation 5 for $v_{it}$ that maximize the likelihood function in equation 6. The MLE results are presented in Table 4. As for what company characteristics to include in $X$ from equation 5, we chose a number of features that seemed to have the most predictive power on whether a deal would happen, conditional on $\alpha$ and $\gamma$ . These features were: the age of the customer (as a firm), two industry group indices (“computer software” and “marketing and advertising”), and two behavioral features which we term “feature 1” and “feature 2”.101010We cannot disclose the nature of feature 1 and feature 2 due to non-disclosure agreement. For a robustness analysis to the inclusion of more or fewere behavioral features, see Appendix D. Features such as number of employees, revenue, location, or some other industry categories industry were not highly predictive once size fixed effects $\gamma$ are included in our regression.

As the results from Table 4 suggest, both mid-size and large customers have, on average and all else equal111111Of course all is is not equal; larger and smaller customers may differ from each other systematically on other characteristics., smaller per-unit willingnesses to pay–respectively by $657 and$ 854 per unit–for the company’s product relative to smaller customers (again, recall that by “size” we do not mean the size of the customer as a firm. We mean the size of their need for the product the company sells to them). We also see that clients in 2021 valued the product more than those in 2020 did. Younger clients seem to place a higher value on LifeLabs’ services. Clients categorized as “computer software” tend to have a higher valuation for the product relative to the average industry, while clients from “Marketing and advertisement” tend to value it less. These results are directionally congruent with the data patterns presented in section 3.

With both $f_{V|\bar{Q}}(\cdot)$ and $f_{\bar{Q}}(\cdot)$ in hand, the joint distribution $f(\cdot,\cdot)$ is recovered. Figure 7 visually presents this joint distribution. The figure confirms the suggestive evidence in Figure 2 that the per-unit willingness to pay seems to be smaller for customers with larger needs. As a result, it seems natural to expect the company to want to offer lower per-unit prices for larger deals. The question is “By how much,” which our estimates of the demand parameters, as well as cost estimates provided below, help us quantify.

Robustness to Demand Specification. For a robustness analysis to the specification used in equation 5, see Appendix D. To summarize the finding in that appendix: the results are robust both on the front of the estimated joint distribution $f(\cdot,\cdot)$ and on the front of the implied optimal price schedule.

5.1.2 Cost Parameters

Detailed cost-side data, alongside conversations with the company, allows us to obtain measures of $c_{1}$ and $c_{2}$ . We do not disclose details on this analysis and the LifeLabs’ cost structure, in the interest of the company’s privacy and because cost analysis is not a core objective of our study. We only share the outcomes that are key to our subsequent pricing analysis and whose structure is generalizable to other nonlinear-pricing settings: Based on our analysis, we arrive at the following figures: $c_{1}\approx$$3,015/customer and$ c_{2}\approx$$718/workshop.

Figure 8 plots the average cost (i.e., $\frac{c_{1}}{q}+c_{2}$ ) as a function of quantity $q$ . This figure shows that, similar to demand side parameters, our cost side analysis suggests that the company’s optimal nonlinear tariff will lower the per-unit price for larger deals.

5.2 Model Fit

Figure 9 describes goodness of fit for our estimated model. It compares multiple quantities measured directly on the data to their counterparts generated by the model. In particular, it examines deal purchase rate, total revenue, total cost, and total profit (revenue net of costs) by three size groups. As can be seen from the figure, our model fits the data quite closely.

We now turn to quantifying the exact shape of this optimal nonlinear contract.

6 Optimal Nonlinear Pricing Scheme

With an estimated model of demand and costs in hand that closely fits the data, we are now ready to solve the optimization problem in equation 4 and arrive at the profit-maximizing schedule. Focusing on the market in 2021, in this section we discuss two topics. First, we present the main results on the optimal schedule. Next, we move to discussing this optimal schedule’s profitability, and consumer- and social-welfare implications.

We need to start with a parameterization for the price schedule $P(\cdot)$ . The main parameterization we work with is similar to LifeLabs’ current pricing strategy: it consists of a few linear segments where the continuation of each of them would pass through the origin. Formally, consider intervals $I_{k}=[q_{k},q_{k+1})$ where $k=1,2,...$ and $q_{1}=0$ . We allow price schedule $P$ to take the form $P(q)=p_{k}\times q,\forall q\in I_{k}$ , where $p_{k}$ values are constants. In our application, we consider $I_{1}$ through $I_{5}$ to be, respectively, $[0,10),[10,20),[20,50),[50,100),[100,\infty)$ . Observe this parameterization restricts the space of all possible schedules from an infinite dimensional object to a 5-dimensional one. As a result, with some abuse of notation, we sometimes refer to the function $P(\cdot)$ as a vector $P=(p_{1},...,p_{5})$ .

Note that we could in principle restrict this space in alternative ways. For instance, we could consider a piecewise linear and continuous schedule using the five intervals above. We in fact do examine this continuous alternative initially (see next subsection). But we chose the approach described in the previous paragraph as our main specification for two reasons: first, it is the structure that the firm is using; and second, as we will show, it yielded a slightly higher profit than the optimal continuous schedule in our simulations. With the parameterization structure in place, we next turn to the empirical analysis, skipping the details of the optimization method we develop. For more details on our method, how it compares to alternative optimization algorithms, and our recommendations for algorithm choice in similar situations, see appendix E.

6.1 Optimal Price Schedule

Figure 10 compares multiple pricing schemes: (i) the current price schedule by LifeLabs, (ii) the optimal linear price schedule, and (iii) the optimal nonliner price schedule. Notably, as mentioned in the previous subsection, we solve for the optimal nonlinear schedule in two ways: first we find the optimal schedule among those that charge fixed per-unit prices within each of the five intervals mentioned above. Second, we find the optimal schedule among those that charge a fixed per-unit incremental rate within each of these five intervals. In other words, the first type of nonlinear schedule consists of five segments where the continuation of each of them passes through the origin; whereas the second type consists of five segments that are connected together and form a continuous function.

Panel (a) provides a comparison across different contracts by plotting marginal prices as a function of deal size (for all contracts except the continuous one.) As can be seen from this panel, the optimal nonlinear schedule charges more than the optimal linear one for small deals and less for larger deals. The optimal difference between the per-unit prices for largest and smallest deal sizes is more than $700.

Panel (b) attempts to provide a comparison between marginal prices of the optimal continuous schedule and those of the optimal schedule with segments that would pass through origin.121212Though some of the marginal prices are far from each other, this is in fact an indication of overall similarity. To illustrate, in the optimal continuous schedule, the segment for deals of sizes 10 to 19 would have a higher starting point relative to a similarly sloped segment for those sizes in a schedule with segments passing through origin (this is because the former simply starts form the end of the segment for smaller deals). As a result, a lower slope in this case for the continuous schedule compensates for the higher starting point. Next section shows that in terms of profitability, the two schedules indeed behave similarly with the discrete one slightly outperforming the continuous one.

6.2 Profit and Welfare Analysis

Table 5 shows how different pricing schemes fare against one another with respect to a number of measures. In addition to the current pricing by the firm, three other pricing schemes are examined. The first scheme is first-degree price discrimination in which the firm tailors pricing toward each individual customer. In this regime, the firm will sell to any customer $i$ with $v_{i}\times\bar{q}_{i}\geq c_{1}+c_{2}\times\bar{q}_{i}$ , charging exactly $v_{i}\times\bar{q}_{i}$ . The second regime is linear pricing, meaning the firm is only allowed to charge tariffs in the form $P(q)\equiv p\times q$ , choosing $p$ optimally. Finally, the third tariff is the optimal nonlinear pricing scheme $P^{*}(\cdot)$ which was described shortly before. Both the oprimal linear price and the optimal nonlinear schedule were shown in figure 10. The three measures on which the above pricing schemes are compared against each other are (i) firm profit, (ii) consumer surplus, and (iii) social surplus.

As can be seen from the table, the optimal linear pricing strategy delivers almost as much profit as the current nonlinear pricing scheme used by the firm. The optimal nonlinear pricing scheme improves profitability by about 5.5%.131313The profitability has been computed both for the optimal nonlinear schedule with segments passing through origin and for the optimal continuous schedule. As the table shows, the former slightly outperforms the latter. As a result of this, for the rest of the paper, we work with optimal nonlinear schedules with segments passing through origin. Note that this 5.5% is an underestimate because it is carried out on profit levels before taking into account firm-level fixed costs (e.g., facility rent/depreciation, full-time employee salaries and benefit, etc.). To get a sense of the magnitude by which firm-level fixed costs could impact these estimates, note that a 10 or 15 $M/y fixed cost would, respectively, increase the 5.5% performance to 12 or 32%.141414We did not ask LifeLabs for an estimate of firm-level fixed costs given that the number would have no bearing on the optimal pricing strategy.

Another question that table 5 helps answer is about the comparison between first- and second-degree price discrimination. If first degree price discrimination were feasible, it would more than double the profit relative to the current strategy. This means that optimal second degree price discrimination recovers only about 5.1% of the profitability gap between linear pricing on the one hand and first degree price discrimination on the other.

Aside from profit levels, Table 5 also compares different pricing schemes with regards to their effects on consumer- and social-surplus. Compared to linear pricing, nonlinear pricing increases consumer welfare by 6.2%, hurting some consumers and benefiting others.151515Note that this empirical result is not general. In theory, the welfare effects of price discrimination could go either way. See Schmalensee (1981) or Varian (1985) for instance. On the front of social surplus, nonlinear pricing outperforms linear pricing by about 5.8%.

Segment-by-segment analysis. In addition to the aggregate analysis presented by Table 5, it is worth conducting a segment-by-segment analysis of how different pricing policies compare against one another. Figure 11 presents such segment-based results. There are two general lessons from this segment-based analysis.

First, and unsurprisingly, 1st-degree price discrimination dominates all other methods both on the fronts of firm profit and social welfare. Its high performance on social welfare is because this approach, by construction, allows a transaction to take place if and only if it is socially efficient (i.e., if $v_{i}\bar{q}_{i}\geq c_{1}+c_{2}\bar{q}_{i}$ ). Given the wide gap between the profitability of this method and other ones, it is natural to expect that the firm would benefit from a customer-by-customer negotiation approach as opposed to posted pricing; because even though this approach is unlikely to replicate the exact profit from 1st-degree price discrimination, replicating even a fraction of it would outperform nonlinear pricing.161616Note that there are other hurdles in the way of adopting a negotiation strategy that may discourage a firm from going that route. In particular (and according to our interviews with the firm), in industries such as HR services, there is a considerable chance that the outcome of the negotiation with one client is subsequently learned by other clients and acts as a barrier to negotiating desirable terms with them.

The second lesson from the segment-based analysis concerns the comparison between linear and nonlinear schedules. Nonlinear pricing generates lower consumer welfare for small-size customers than does linear pricing. This is reversed for larger customers. This is because, by finding the per-unit price that is, loosely speaking, “optimal on average”, linear pricing benefits segments that received higher prices under nonlinear pricing and hurts those who received lower prices under the nonlinear schedule. With respect to profitability, nonlinear pricing fully dominates linear pricing given that nonlinear pricing allows the firm to tailor the charges toward respective size groups.171717As we will see later, however, this intuition is incomplete. Optimal nonlinear pricing entails more than choosing the right price for each segment. As emphasized in a long line of theoretical research on screening, nonlinear pricing also requires careful attention to whether customers in each segment respond to relative prices by purchasing a different amount that was “meant for them.”

7 Further Counterfactual Analyses

In this section, we carry out additional counterfactual analyses that serve two purposes. First they shed further light on our methodological framework and its features. Second, they yield substantive insights that could be of value independent of the methodology. We carry out three studies. We start by examining how optimal nonlinear pricing would respond to various changes in the demand conditions, and we study the role of incentive compatibility constraints. We then dive deeper into the role of costs in shaping the optimal contract, both on their own and in comparison to demand-side factors. Finally, we investigate the consequences of enriching the contract space by allowing for a fixed fee.

7.1 Analysis of Demand-Side Factors

This analysis delves deeper into the role of demand-side factors in shaping the optimal contract. Our objective is to convey two key messages. First, it is the entire shape of the joint distribution $f(\cdot,\cdot)$ that matters for optimal pricing, as opposed to a simple correlation between size $\bar{q}$ and value $v$ across customers $i$ . Second, due to incentive compatibility issues, sellers need to jointly optimize all marginal prices $p_{1},...,p_{5}$ which comprise the nonlinear schedule, as opposed to doing so separately. To this end, the rest of this section has two parts. We start by formalizing what we mean by “separately optimizing” prices $p_{1},...,p_{5}$ due to ignorance of incentive compatibility constraints. Next, we quantify how the optimal price schedules, profits, and the importance of incentive compatibility constraints respond to changes in demand conditions.

The Role of Incentive Compatibility Constraints

As we explained in section 6, we jointly optimize the five per-unit prices $p_{1},...,p_{5}$ . A key question is: can we optimize these prices separately instead? That is, can we take each price $p_{k}$ for $k=1,...,5$ and optimize it only for the market of customers within the relevant range $I_{k}$ of sizes? Formally, define the “local profit function” $\pi_{k}(p)$ for $p\in\mathbb{R}$ to mean the profit to the firm if (i) the set of potential customer consisted only of those with $\bar{q}_{i}\in I_{k}$ and (ii) the firm charged the linear price schedule of $P(q)=p\times q$ :

[TABLE]

where $N_{k}$ is the total count of all potential customers $i$ with $\bar{q}_{i}\in I_{k}$ ; and with a slight abuse of notation, $q^{*}(p|v,\bar{q})$ is the amount purchased by customer with size $\bar{q}$ and value $v$ under the linear price schedule of $P(q)\equiv p\times q$ .

Denote by $\tilde{p}_{k}$ the separately optimized price for size group $I_{k}$ . That is:

[TABLE]

We can now formally state what it means to “optimize prices $p_{1},...,p_{5}$ separately”. It means charging the schedule $\tilde{P}=(\tilde{p}_{k})_{k=1,...,5}$ instead of $P^{*}=(p^{*}_{k})_{k=1,...,5}$ . Likewise, a formal way of asking “what would be the consequences of optimizing prices separately” would be to ask “how closely does $\pi(\tilde{P})$ approximate $\pi(P^{*})$ ?” This is an important question both conceptually (are different size groups effectively “separate” markets?) and computationally (can we avoid the difficult joint optimization problem and optimize one-dimensional objects instead?).

By the definition of $P^{*}$ as the optimal schedule, it has to be that $\pi(\tilde{P})\leq\pi(P^{*})$ . The reason why this inequality may be strict is “incentive compatibility” constraints that are ignored when prices are separately optimized. To illustrate, optimizing separately ignores the fact that if $\tilde{p}_{3}$ is substantially smaller than $\tilde{p}_{4}$ , then some customers $i$ of size-group $4$ (i.e., $\bar{q}_{i}\in I_{4}$ ) who would purchase $\bar{q}_{i}$ under $\tilde{p}_{4}$ absent other options, might take the opportunity presented by the wide gap between $\tilde{p}_{3}$ and $\tilde{p}_{4}$ and reduce their purchase sizes. Similarly, if $\tilde{p}_{3}$ is substantially lower than $\tilde{p}_{2}$ , customers of size group 2 might respond by increasing their purchase sizes, paying an overall lower total price, and imposing a higher cost of production to the firm. Put differently, the schedule $\tilde{P}$ naively attempts to separately sell to different size groups at their respective optimal prices. This, if feasible, would yield a total profit of $\Sigma_{k}\pi_{k}(\tilde{p}_{k})$ , but is not incentive compatible and ends up delivering a lower profit. The schedule $P^{*}$ , however, optimizes in an incentive-compatibility-aware fashion.

As a long line of research on mechanism design suggests, it is theoretically conceivable that ignoring incentive compatibility results in strictly positive losses: $\pi(P^{*})-\pi(\tilde{P})>0$ . The empirical question is how substantial is the loss? This, among other demand-related ones, is a question we turn to next.

Simulation and Analysis of Different Demand Conditions:

With a formalization of individually v.s. jointly optimized prices at hand, we now turn to counterfactuals examining the role of demand side factors. We find it more illustrative to start the counterfactual analysis in this section from the data rather than from (the later stage of) the estimated model. In the first counterfactual, we modify our original dataset so that mid-size deals ( $q\in[20,50)$ , equivalent to $I_{3}$ ) have a meaningfully higher acceptance rate than small ( $q<20$ , equivalent to $I_{1}\cup I_{2}$ ) and large ( $q\geq 50$ , equivalent to $I_{4}\cup I_{5}$ ) ones. In a second counterfactual, we do the opposite. For each counterfactual, we re-estimate the model (i.e., the joint distribution $f(\cdot,\cdot)$ ), and re-compute both the jointly and individually optimized price schedules. The two right columns of Figure 12 depict these two counterfactuals. We also show the original case on the left for comparison. We now turn to analyzing these results based on the two major lessons mentioned above.

Importance of the shape of $f(\cdot,\cdot)$ beyond simple correlation: As can be seen from panels (d), (e), and (f) of Figure 12, the flexibility of the shape of $f(\cdot,\cdot)$ in our model allows us to capture the nature of the relationship between deal accept/reject outcome and deal size in a comprehensive manner. All of the estimated distributions in Figure 12 have flexible forms and multiple local peaks. Such flexibility is not feasible to capture with more restrictive models of $f(\cdot,\cdot)$ or with simple measures such as correlation. See Table 7 for illustration: the correlation between $v_{i}$ and $\bar{q}_{i}$ across $i$ in the original model, left column of Figure 12, and right column of Figure 12, are -0.26 and -0.41 respectively. Although these correlations are fairly similar to one another, the shapes of the estimated distribution and the optimal contracts are meaningfully different. There are at least two reasons why such simple correlational measures cannot replace a flexible structure for $f(\cdot,\cdot)$ .

First, as can be seen from Figure 12 the relationship between $v$ and $\bar{q}$ need not be monotone. And non-monotone relationships that are fundamentally different from one another (such as U-shaped and inverse U-shaped) may look similar when assessed using a linear model (e.g., correlation).

Second, and more crucially, the importance of one sample in the estimation procedure might differ from its importance with respect to the counterfactual policy analysis of interest. For instance, consider the smaller local peaks on the right end of the distribution $f(\cdot,\cdot)$ in panels (e) and (f) of Figure 12. They belong to mid- and large-size deals. Note that these peaks are substantially lower compared to the corresponding peak(s) for small-size deals. This is simply because $f(\cdot,\cdot)$ is a probability distribution and there are far fewer mid- and large-size samples in the data compared to smaller ones. As a result, if we impose a restrictive model of $f(\cdot,\cdot)$ that does not adequately separate the estimation of $v$ across sizes, the weight of these fewer samples will be dwarfed in the estimation procedure by substantially more frequent small-size deals. This, in turn, would bias our estimation of average $v$ for these less frequent deal sizes. Such a bias would be detrimental to our counterfactual analysis. This is because as depicted by Figure 3 earlier in the paper, larger deals, although much less frequent, have a meaningful role in shaping the revenue and profitability.

In sum, we find the analysis summarized by Figure 12 to be in support of our modeling choices on where to be parsimonious (shape of $V_{i}(\cdot)$ ) and where to be flexible (shape of $f(\cdot,\cdot)$ ).

Role of incentive compatibility constraints: The bottom panels of Figure 12 depict not only the “jointly optimized” schedule $P^{*}$ for each demand scenario, but also the “individually optimized” schedule $\tilde{P}$ . There are two broad lessons to learn from this figure.

The first lesson is that if the deal acceptance rate (and hence the estimated average $v_{i}$ ) is substantially heterogeneous across size groups, then the profitability gap $\pi(P^{*})-\pi(\tilde{P})$ between the individually and jointly optimized schedules may be wide. This is because under such conditions, the individually optimized marginal prices end up being far from each other, exacerbating the loss from the lack of incentive compatibility. To see this, note that $\frac{\pi(P^{*})-\pi(\tilde{P})}{\pi(P^{*})}$ is less than $1\%$ in the original data (i.e., left column of figure), whereas it is 7.7% for the demand scenario in the middle column and around 7.1% in the right-most column (the profit and revenue levels are posted on the top left corners of bottom panels in Figure 12). Observe that, similar to our previous analyses, these percentages are gross of firm-level fixed costs. If the firm-level fixed cost between 10 and 15 $M/y, the former profitability gap would range from 19.0% to 71.7% and the latter from 12.5% to 19.7%.

The second important lesson goes beyond profitability, and focuses on how the shapes of the two contracts compare to one another. In general, relative to the individually optimized contract $\tilde{P}$ , the jointly optimized contract $P^{*}$ seems to moderate the price variation by size. Take the second column of the figure for illustration. Here, $p^{*}_{3}$ is substantially smaller than $\tilde{p}_{3}$ , even though mid-size customers (i.e., those with $\bar{q}_{i}\in I_{3}$ ) have much larger average valuations $v_{i}$ relative to other sizes. Also, $P^{*}$ charges higher prices than $\tilde{P}$ in adjacent size ranges $I_{2}$ and $I_{4}$ . Conceptually similar arguments (but in part in the opposite direction) hold for the right column of the figure.

Table 8 should shed more light on why such a moderating behavior is optimal. In this table, we compare the profits generated by $P^{*}$ and $\tilde{P}$ from different size-groups. In particular, we break down $\pi(P^{*})-\pi(\tilde{P})$ for customers that are small ( $\bar{q}_{i}<20$ ), mid-size ( $\bar{q}_{i}\in[20,50)$ ), and large ( $\bar{q}_{i}>50$ ).

As the table shows, for the demand system in the middle column of Figure 12, $P^{*}$ delivers $1.45M/y more profit from medium-size customers, relative to$ \tilde{P} $. This happens because in the vector$ P^{} $, the element$ p^{}{3} $is only moderately larger than the other elements, which stands in contrast to the wider gap between$ \tilde{p}{3} $and other elements of$ \tilde{P} $. This prevents many mid-size customers from “flocking” to cheaper quantities, thereby helping to boost the profitability from those customers. Also helpful toward this objective is the fact that$ p^{}_{2} $and$ p^{}{4} $are elevated (relative to their$ \tilde{p} $counterparts). Such elevation in$ p^{*}{2} $and$ p^{}_{4} $either further prevents mid-size customers from adjusting purchase sizes, or helps make a higher profit off of those mid-size customers who do adjust nonetheless. Of course this means that the tuning of$ p^{}{2} $and$ p^{*}{4} $is done in part with the “global” goal of taming mid-size customers’ behavior; which means those prices are “locally” sub-optimal. Thus, it should not be surprising that$ P^{*} $delivers less profit than$ \tilde{P} $from small and large customers, by the amounts of$ 0.16M/y and $0.01M/y respectively.

Similarly, as Table 8 shows, $P^{*}$ makes $0.39M/y less profit than$ \tilde{P} $from mid-size customers under the demand system in the right-most column of Figure [12](#S7.F12). This is because jointly optimized$ P^{} $charges a substantially higher per-unit price for mid-size deals than mid-size customers are willing to pay. Though this leads to losses from those customers, it prevents smaller and larger customers from taking advantage of low rates in the mid-range. Alongside this,$ P^{} $also charges lower prices outside of mid-size deals to further discourage customers of other sizes from moving. As a result, the profitability from those sizes, which is compromised under$ \tilde{P} $due to not accounting for incentive compatibility issues (i.e., large and small customers taking advantage of mid-size rates), is partially protected under$ P^{*} $. As the table shows, these improvements are$ 0.75M/y and $1.33M/y respectively for small and large customers.

To recap, compared to the individually optimized schedule, the jointly optimized one seems to charge prices that vary less significantly with size.

7.2 Analysis of Cost-Side Factors

In this section, we examine the role of cost-side factors. We start by comparing the role of those factors against that of demand-side factors in shaping the optimal nonlinear contracts. We then study how the optimal contract responds to various cost-side scenarios.

7.2.1 Cost- v.s. demand-side factors: a comparative analysis

The fact that our optimal price schedule involves lower per-unit prices for larger deals seems consistent with both demand-side cost-side estimates. On the demand side, as Table 4 suggests, customers with medium and large sizes $\bar{q}_{i}$ tend to have lower valuations $v_{i}$ . On the cost-side, as Figure 8 depicts, average cost to sell deals of larger sizes is substantially lower than that for smaller deals. An empirical question is, hence, which of these two factors is a more important reason behind the shape of our optimal price schedule?

Figure 13 helps answer this question. On the left panel, we compare the optimal price schedule (solid blue lines) to the optimal schedule under the counterfactual scenario in which the firm faces no costs (i.e., $c_{1}=0,c_{2}=0$ ). This allows to “shut off” the role of costs in causing nonlinearity in $P^{*}$ and focus only on the role of demand. The right panel considers a counterfactual scenario in which the valuations $v_{i}$ are homogenized across customers, up to the idiosyncratic error term. This homogenization is formally operationalized by by using $\bar{v}_{it}:=\mathbb{E}_{i^{\prime}}[\beta\times X_{i^{\prime}}+\alpha_{t}+\gamma_{\tilde{q}}]+\epsilon_{it}$ in the counterfactual simulation instead of original $v_{it}$ .181818Note that this demand homogenization takes place only on $i$ rather than on $it$ . Time index $t$ is fixed at 2021, because this is the year our counterfactual analysis is focused on. In this counterfactual, hence, values $v_{i}$ and sizes $\bar{q}_{i}$ are independently distributed across potential customers. This allows to shut off the role of demand in shaping the nonlinearity of $P^{*}$ and focus only on cost factors.

As Figure 13 shows, the downward trend of per-unit prices in deal size is qualitatively preserved when we shut off either factor. Thus both demand and cost factors seem to have a role in the shape of the optimal contract. That said, on the front of magnitude, demand-side factors seem to have a substantially larger role relative to cost-side ones. To see this, observe that the counterfactual optimal schedule on the left panel of Figure 13 has a much wider range of prices compared to the one on the right. In fact, the price range across sizes is almost the same between the original optimal and the no-cost counterfactual price schedules on the left panel of the figure. This seems puzzling: if costs do cause diminishing marginal prices (as seen in the right panel of Figure 13), then why does the range of marginal prices not shrink in a counterfactual with no costs? For an intuitive understanding of why this is the case, note that costs affect the shape of the optimal schedule via two channels. The first channel is what was discussed so far: deal sizes with higher average costs tend to have higher per-unit prices in the optimal contract. The second channel has to do directly with the existence of costs rather than heterogeneity in them: when there are marginal costs, the optimal price schedule is in part guided by how to cover those costs; and this leaves the seller with less freedom to shape the price schedule based on demand-side factors. To sum up: when costs are assumed away (green dashed lines in the left panel), not only will their direct effect on prices will be gone, but so will their moderating effect on the role of demand.

7.2.2 Detailed analysis of cost-Side factors

In this section, we examine the way in which cost parameters $c_{1}$ and $c_{2}$ shape the optimal price schedule. Recall that $c_{1}\approx$$3,015/customer and$ c_{2}\approx$$718/workshop. Figure 14 depicts the optimal contracts as these parameters change. Panel (a) shows the optimal contract for a range of $c_{1}$ values while panel (b) does the same for $c_{2}$ .

As evident by panel (a) of this figure, changes in $c_{1}$ mostly impact the marginal price for smaller size deals. This is because the larger the deal size, the smaller the customer-level fixed cost will be as a fraction of the total cost associated with the customer. Observe that at $c_{1}=0$ , the optimal contract qualitatively preserves its shape. This, as discussed before, indicates that demand side factors (i.e., the shape of the joint distribution $f(\cdot,\cdot)$ ) have a critical role in shaping the optimal price schedule. Finally, it is worth noting that the observed impact of $c_{1}$ on optimal pricing is seemingly at odds with the traditional wisdom that fixed costs should have no bearing on pricing. This is because $c_{1}$ is the customer-level fixed cost and, unlike firm-level fixed costs, is not a “sunk”. This notion of cost is not well-defined in the discrete choice settings where each customer consumes zero or one unit.191919In discrete choice, $c_{1}$ would always get incurred with one unit of $c_{2}$ . Hence, one could think of $c_{1}+c_{2}$ as the variable cost. But in continuous-choice environments, as we show here, having the right estimate of $c_{1}$ is critical for designing the right pricing strategy.

The effect of $c_{2}$ on the optimal pricing schedule is shown in panel (b) of the figure. Price increases across the board as the per-unit cost increases, and the pass-through is more or less uniform and around $\frac{1}{3}$ . Observe that the direction in which $c_{2}$ affects the shape of the optimal schedule is in contrast to that of $c_{1}$ . The larger the marginal cost $c_{2}$ , the closer the optimal schedule is to a linear contract (the opposite was the case for $c_{1}$ ). Per our discussion in subsection 7.2.1, this happens because larger values of $c_{2}$ restrain the ability of the seller to shape the price schedule based on demand.

7.3 Fixed Fees

Up to this point in the analysis, we optimized a nonlinear price schedule that maintained the original pricing architecture adopted by the firm: charging five different marginal prices for different deal-size intervals $I_{k}\in\{[1,10),[10,20),[20,50),[50,100),[100,\infty)\}$ . In this section, we investigate an important possible modification to this structure: the addition of a fixed fee. We examine amending a fixed fee both to a linear contract (thereby forming a two-part tariff) and to the nonlinear, 5-dimensional, schedule. Table 9 shows the optimal linear and nonlinear contracts with and without (optimally chosen) fixed fees. It also presents revenues and profits generated from these four contract types.

There are two main lessons to learn from Table 9. One is about profitability and the other has to do with the shapes of the optimal contracts.

On the front of profitability, adding a fixed fee appears to be a powerful tool. For instance, adding a fixed fee to a linear contract and forming a two-part tariff yields more than $1M/y extra profitability. It slightly outperforms a full nonlinear schedule that does not have a fixed fee. A main reason for this observation is that a fixed fee effectively introduces diminishing average prices, mimicking the advantage of nonlinear contracts. To empirically verify that introducing effective nonlineariry is indeed at work here, note that the added profitability by a fixed fee shrinks from 19.10-18.04=1.06$ M/y to 19.37-19.04=0.33 $M/y when the baseline contract is nonlinear as opposed to linear. The fixed fee also helps “partially cover” the per-customer fixed cost$ c_{1} $=$ 3015/customer. This helps ensure that fewer individual contracts can run a loss; and those that do so run a smaller loss. A suggestive piece of evidence that fixed fees are helping by “shedding” unprofitable customers is that even though they boost the total profit, they always do so at the expense of the total revenue. This latter comparison also suggests that if a firm has a focus on growth (and, hence, seeks to maximize revenue rather than profit), fixed fees may not be as strongly recommended.

Regarding the shape of the optimal contracts, it is worth using our empirical results to relate to the existing literature on the use of fixed fees. To ease this comparison, we focus mainly on the second row of Table 9 which examines the optimal two-part tariff. The literature on optimal two-part tariffs with homogeneous customers (see for instance Jeuland and Shugan (1983)) has long established that, under certain conditions, the optimal pricing policy involves charging a “variable fee” that is equal to marginal cost and then using the fixed fee to capture all of the surplus from the consumer.202020Of course, even with homogeneous customers, there are conditions under which the optimal two part tariff deviates from this standard structure. Examples are buyer risk aversion (Rey and Tirole, 1986; Ghili and Schmitt, 2023) to competition among buyers when they are part of the supply chain (Rey and Vergé (2008)), and non-specifiable contracts (Iyer and Villas-Boas (2003)) It has also been established, however, that under heterogeneous preferences by consumers, the optimal two-part tariff may charge a variable fee different from–typically above–marginal cost (see Oi (1971) for an early example). This is what our empirical analysis suggests as well: the optimal variable fee is $2,266/unit, which is almost three times as high as the marginal cost$ c_{2} $=$ 718/unit. To gain intuition for why this happens, first consider a hypothetical scenario under which the seller knows each individual buyer’s value function and is able to charge an individualized two-part tariff to it. In this scenario, we have separate pricing problems in each of which consumer-homogeneity is restored (because we have only one customer in each). Thus, it is optimal to offer each consumer a two-part tariff in which the variable fee is $718 and the fixed fee captures the entire remaining surplus of$ (v_{i}-718)\times\bar{q}{i} $.212121Of course, if$ (v{i}-718)\times\bar{q}{i}<c{1} $, the seller will not sell to customer$ i $. This can be implemented by setting the individualized fixed fee to$ \max{c_{1},(v_{i}-718)\times\bar{q}{i}} $Now observe that even though our analysis found a negative relationship between$ \bar{q}{i} $and$ v_{i} $, the relation between$ \bar{q}{i} $and$ (v{i}-718)\times\bar{q}{i} $is positive. This means that if the seller were able to offer individualized two-part tariffs, it would charge fixed fees that would tend to increase in$ \bar{q}{i} $. The question that arises at this point is: given that the seller is in reality only allowed to offer a common two-part tariff to all customers, how can it attempt to mimic the aforementioned scenario in which larger customers pay larger fixed fees? The answer is: by leveraging the variable fee which helps fine-tune how closely the total pay is tied to size$ \bar{q}{i} $. This is why we observe an optimal variable fee so significantly above marginal cost$ c{2}=718$.

Note that the above intuition also helps to understand why the optimal fixed fee (either when coupled with a linear price or when coupled with a nonlinear schedule) is set to only partially cover the per-customer fixed cost $c_{1}$ = $3015/customer. This is because a fixed fee as high as$ 3015 would have put a downward pressure on the variable fee and curtailed the ability of the seller to price discriminate across customers of different sizes. As a result, we observe that the seller accepts the possibility of running a loss on smallest customers in order to better price discriminate, even though doubling the fixed fee would have guaranteed no loss from any single customer.

Additional counterfactual analyses on 3rd-degree price discrimination

Appendix G carries out a number of counterfactual analyses in which we study the impact of 3rd-degree price discrimination as well as a combination of 2nd and 3rd degree discrimination strategies.

8 Discussion

This section provides discussions on alternative methods, broad applications of our framework, as well as caveats and future research.

8.1 Alternative Estimation Method for Experimental Data

Here, we discuss an alternative estimation procedure that one could use if exogenous variation on prices were available. There are multiple reasons why such an analysis is worth carrying out. First, with exogenous price variation, one would no longer require data on the sizes of unsuccessful deals for estimation. Given that this change of data leads to changes in the estimation methodology in a non-trivial way, it is useful to set out this alternative estimation routine so that it can be used by firms/researchers that do have access to experimental data. Second, our analysis here will provide a more formal sense of how taxing the requirements will be for firms of running experiments that can allow for inference of the joint distribution between size and value, which helps further motivate our original approach of leveraging data on intended sizes of failed deals.

Our analysis here has two components. First, based on a “ground truth” joint distribution $g(v,\bar{q},X)$ over customer sizes, values, and customer observable characteristics, we construct a simulated dataset of customers’ purchase decisions under experimental price variation. Next, we exposit a method that takes as inputs the simulated data and recovers an estimate of $g(\cdot,\cdot,\cdot)$ .

Simulated Data: We construct our “ground truth” distribution $g$ from the previously estimated distribution $f(\cdot,\cdot)$ over sizes and values (visualized in Figure 7) and its co-variation with observable characteristics $X$ . To this end, we start from our original dataset222222By original, we mean the version before carrying out the augmentation described previously. and implement the following changes:

Concatenate multiple copies of the dataset to achieve a desired number of rows (e.g., 1 Million) 2. 2.

For each row $i$ of the resulting dataset:

(a)

Create a value column $v_{i}$ by taking a draw from the estimated distribution from equation 5. 2. (b)

Create a price column $p_{i}$ by a randomly selecting a price in the set {0, $1K,$ 2K, $3K,$ 4K,$5K}. 3. (c)

Replace the deal success column $d_{i}$ by $\textbf{1}_{v_{i}\geq p_{i}}$ . 4. (d)

Remove from the dataset the entire column $v_{i}$ . Also remove (i.e., replace by NA) the value of $\bar{q}_{i}$ wherever $d_{i}=0$ .

Once we have this dataset, the objective is to recover $g(\cdot,\cdot,\cdot)$ without using data on $\bar{q}_{i}$ for the cases of non-purchase $d_{i}=0$ , and without using any data on $v_{i}$ . Step (d) above in the construction of the dataset ensures we indeed do not have access to those data elements.

Estimation Method: We start by noting that under $p_{i}=0$ , almost every customer purchases. Hence, the marginal distribution $g_{\bar{q},X}(\cdot,\cdot)$ over $\bar{q}$ and $X$ can be approximated by the empirical distribution over $X$ and $\bar{q}$ when $p_{i}=0$ and $d_{i}=1$ . Denoted $\tilde{g}_{\bar{q},X}(\cdot,\cdot)$ , this empirical distribution is by construction observable. We then use the law of total probability to recover conditional distributions $g_{\bar{q},X}(\cdot,\cdot|v_{i}<\bar{p})$ for $\bar{p}\in\{\$ 1K,$2K,$3K,$4K,$5K} $. For each such$ \bar{p}$, we have:

[TABLE]

Observe that in the above equation, $g_{\bar{q},X}(\cdot,\cdot|v_{i}<\bar{p})$ is the only unknown: $g_{\bar{q},X}(\cdot,\cdot)$ we constructed an estimate for from the empirical distribution under zero price; $\text{Prob}(v_{i}\geq\bar{p})$ and $\text{Prob}(v_{i}<\bar{p})$ can be respectively estimated by directly measuring the fractions of $d_{i}=1$ and $d_{i}=0$ once the data is filtered for the $\bar{p}$ treatment group: $p_{i}=\bar{p}$ ; and, in a similar fashion, $g_{\bar{q},X}(\cdot,\cdot|v_{i}\geq\bar{p})$ can be estimated by taking the empirical distribution over $X,\bar{q}$ when $p_{i}=\bar{p}$ and $d_{i}=1$ . Plugging all of those into the equation, we can come up with estimate $\tilde{g}_{\bar{q},X}(\cdot,\cdot|v_{i}<\bar{p})$ for $g_{\bar{q},X}(\cdot,\cdot|v_{i}<\bar{p})$ .

To summarize: even though in our analysis we do not have data on $\bar{q}_{i}$ for individual unsuccessful deals, we can still estimate the distribution of $\bar{q}_{i}$ (joint with observables) over all unsuccessful deals, conditional on each treatment price. This will be sufficient for the purpose of estimating the full joint distribution $g(\cdot,\cdot,\cdot)$ and, by extension, the optimal nonlinear pricing schedule, tasks which we turn to now.

Once equipped with estimates $\tilde{g}_{\bar{q},X}(\cdot,\cdot)$ and $\tilde{g}_{\bar{q},X}(\cdot,\cdot|v_{i}<\bar{p})$ for $g_{\bar{q},X}(\cdot,\cdot)$ and $g_{\bar{q},X}(\cdot,\cdot|v_{i}<\bar{p})$ , we can use them to “fill in” the missing $\bar{q}_{i}$ data for unsuccessful deals. To this end, for each row with $p_{i}=\bar{p}$ and $d_{i}=0$ and observable characteristics $X$ , one can construct the conditional probability distribution $\tilde{g}_{\bar{q}|X}(\cdot|v_{i}<\bar{p},X=X_{i})$ , take a draw of $\bar{q}$ , and use it to fill the missing value for $\bar{q}_{i}$ .232323To construct the conditional probability $\tilde{g}_{\bar{q}|X}(\cdot|v_{i}<\bar{p},X)$ , one would first need to “smooth” $\tilde{g}_{\bar{q},X}(\cdot,\cdot|v_{i}<\bar{p})$ using functional form assumptions. To avoid this, another approach (which is what we take) would be to take a joint draw of $\bar{q},X$ from the original distribution $\tilde{g}_{\bar{q},X}(\cdot,\cdot|v_{i}<\bar{p})$ and replace them for the missing $\bar{q}_{i}$ as well as the non-missing $X_{i}$ . Asymptotically, they should lead to the same outcome.

With the last step finished, we have a dataset akin to the one that was used to estimate a model of $v_{i}$ according to equation 5. As a result, we can repeat the same analysis here and estimate this equation, which would complete our estimation process.

Results: We carried out the procedure above, using a simulated dataset with one million rows. We estimated the joint distribution $g$ over $\bar{q},v,X$ . Distribution $g$ is difficult to visualize. But the most critical (and yet easy to visualize) object of estimation is the less-general joint distribution $f$ over $\bar{q},v$ . Thus, we use $f$ to present our results. See Figure 15 for a comparison between the assumed ground truth $f$ and our recovered $\hat{f}$ from the procedure outlined above. As the figure shows, the two distributions are similar. The implied optimal price schedules from them are also similar but there is some divergence for large deals.

We finish this section by reviewing some positives and negatives about the experimental approach. As for positives, there are at least two. First, it can help us estimate the model and come up with the optimal nonlinear pricing schedule even if the only variation we have is a set of linear prices. Second, if the range of prices experimented with is wide, we can estimate the $v_{i}$ equation without the functional form assumptions that we imposed in 5.

On the front of drawbacks to the experimental approach, the first issue is that it is expensive. As our analysis shows, we need a wide range of experimental prices. Especially, we need to experiment with very low prices in order to get an estimate for the unconditional marginal distribution $\tilde{g}_{\bar{q},X}(\cdot,\cdot)$ on $\bar{q},X$ . In what we proposed above, we set one of the experimental prices as zero for this purpose. This is expensive to firms. One can still estimate $\tilde{g}_{\bar{q},X}(\cdot,\cdot)$ through extrapolation if the lowest experimental price is larger than zero. But such extrapolation leads to some accuracy loss; and the higher the smallest experimental price, the larger the accuracy loss. Another reason why the experimental approach is expensive is how many data points it needs: we need representative samples across all size groups and under each price experiment. As the illustrative analysis above suggests, this requisite sample size can be substantial. This is far from reality in most B2B markets where the number of potential customers is small in the first place, and firms are at best willing to run experiments on a small subset of their markets. Additionally, a drawback of our main proposed method (i.e., using data on unsuccessful deals) is also the case for price experiments. More specifically, one worry about inference based on unsuccessful deals is that it ignores those potential customers who did not even enter a conversation with the firm because their $v_{i}$ was low but would have purchased under lower-than-observed prices. The same issue exists with experimental price variation. This is because under an experiment, customers decide whether to approach the seller based on the publicly posted price schedules, and not based on the price-treatment-group they would be assigned to.

8.2 General Applicability

Our empirical framework is applicable in any context in which either pricing experiments are feasible, or “intended size of use” for failed deals is directly recorded or can be estimated/proxied. Even though to the best of our knowledge the use of a dataset with this feature is novel in scholarly work on pricing, such data is increasingly collected and maintained by firms. Other examples of where would-be sizes of failed deals are recorded or can be proxied are: Cloud Computing, B2B pay services such as Amazon Pay or PayPal, or Consulting Services.242424For instance, many consulting companies use Salesforce software to maintain a dataset of their potential deals. The software helps record the intended size of a contract along different stages of the negotiation. The software also keeps record of “pursuits” that eventually did not lead to a deal.

In the specific case of this study, our results did lead to a change in LifeLabs’ pricing strategy that is qualitatively in line with our recommendation, though not fully coinciding with it. More specifically, they reduced their prices as our study recommends; but did so only for larger deals.252525Firms adjusting their pricing strategies in the direction of (but not exactly coinciding with) recommendations from academic studies is not unprecedented. An instance is Dubé and Misra (2017) and the impact of their study on Ziprecruiter’s pricing. See Figure 16.

8.3 Limitations and Future Research

We discuss three avenues for future research here. Of course this list is non-exhaustive.

8.3.1 Smoothly diminishing rates of return to customers and other dimensions of heterogeneity

Our formulation of gross value is $V_{i}(q)\equiv v_{i}\times\min(q,\bar{q}_{i})$ . Though this captures the two key dimensions of heterogeneity that we are focused on in this paper (i.e., size and value,) it does abstract away from smoothly diminishing rate of return. That is, the marginal value for another unit of the product is constant at $v_{i}$ until it hits the stagnation point and reduces to zero. A more general formulation would allow for smoothly diminishing returns to scale on the part of the buyer. One way to do so would be to assume $V_{i}(q)\equiv u(v_{i},\min(q,\bar{q}_{i}))$ where $u:\mathbb{R}^{2}\rightarrow\mathbb{R}$ is a function increasing in its both arguments and concave in the second one. An illustration of this structure is: $V_{i}(q)\equiv v_{i}\times(\min(q,\bar{q}_{i}))^{\alpha}$ for some $\alpha\in(0,1)$ . We do examine a form of such an extension in appendix C, and show that our results do not change drastically as a result of this extension. That said, in our extension in that appendix, we take the estimated distribution $f(\cdot,\cdot)$ over $\bar{q}_{i}$ and $v_{i}$ from our main analysis, and then add another parameter to the value functions in order to capture the degree of smoothness.

A natural next step for future research would be to incorporate this smoothness parameter in the estimation stage rather than add it after the initial, non-smooth, model has been estimated. It can be expected that in the smooth model, the purchase size need not coincide with $\bar{q}_{i}$ even under a concave contract. Similarly, it is less reasonable to assume that the recorded would-be size for unsuccessful deals corresponds to $\bar{q}_{i}$ .262626One possibility would be to assume this quantity represents the number of units that the buyer would expect to buy conditional on its uncertainty being resolved in a way that would make non-zero purchase optimal. As a result a new estimation process (likely with more parametric restrictions on the joint distribution $f(\cdot,\cdot)$ ) would be necessary in order to incorporate smoothness.

8.3.2 Endogenous set of potential customers

Second, we assume that the set of deal-talks is exogenous. Put in the terminology of our timeline in Figure 6, we assume that participating in a conversation with the seller, which leads to a record of $\bar{q}_{i}$ , is costless and is, hence, done by any firm that could benefit from the product (i.e., any firm with $\bar{q}_{i}>0$ ). Assumptions similar to this have previously been made in the discrete choice literature (see Cohen et al. (2016) and Dubé and Misra (2017) for instance.) Nevertheless, this assumption is not ideal, neither in the discrete-choice setting nor in our setting. The reason is that the potential customers that reach out to the seller are selected based on the likelihood to purchase. This means a change in price can change not only what deals succeed, but also who joins the pool of potential customers in the first place. This can be modeled by assuming that communicating with the seller in stage 2 of the timeline in Figure 6 is costly to any potential customer $i$ , and that the any customer $i$ who expects to have low $v_{i}$ may avoid approaching the seller in order to save on this cost. The estimation of such an extension to the model would require additional assumptions. We skip this extension in this paper.

In spite of this issue, we believe our results are reasonable. This is because potential customers who did not initiate a conversation are likely those with lower $v$ values. As a result, missing them from the analysis would overestimate the overall optimal prices. Given that our current recommendation to the company is to lower their prices, our recommendation would only be strengthened if we were able to observe potential customers who did not start a conversation about a deal.

8.3.3 Competition

As long as competitors are not expected to respond to the changes in the firm’s pricing, our framework is fully compatible with the presence of competition in the market. Competition would only imply that $v_{i}$ should be interpreted as the value relative to the next best option which can in principle be a competitor’s product. However, if competitors are expected to respond to the firm’s pricing, our framework falls short. This shortcoming can only be resolved if the framework is amended with an empirical strategy that helps to quantify how the overall price levels and shapes of nonlinear contracts offered by competitors would respond to a given firm’s pricing. This is beyond the scope of our paper.

9 Conclusion

This paper empirically analyzed optimal nonlinear pricing. We proposed a model of demand in continuous choice settings which captures the notion of “customer size of use.” We estimated the joint distribution of customer size and customer per-unit willingness to pay by leveraging a novel dataset that records information (including prospective deal size) not only for successful deals but also for unsuccessful ones. We then used the estimated model to solve for the optimal nonlinear tariff.

We find that optimal nonlinear pricing improves upon the profitability of optimal linear pricing by at least 5.5%. Nevertheless, this second-degree price discrimination method recovers only about 5.1% of the profitability gap between linear pricing and first-degree price discrimination. We also find that second degree price discrimination improves consumer welfare by about 6.2% and social welfare by about 5.8%.

We conducted further counterfactual analyses in order to generate general insights above and beyond our specific application. Among other analyses, we (i) quantified the magnitude of the profit impact of incentive compatibility constraints, (ii) examined the role of cost side v.s. demand side factors in shaping the optimal contract, and (iii) studies the effects of using a fixed fee on the shape and profitability of the optimal contract.

We believe our analysis can be broadly applicable by firms that seek to optimize a nonlinear pricing tariff, especially in B2B contexts such as cloud computing, pay systems, and SaaS. A pre-requisite for the possibility to implement our method would be maintenance of sufficiently detailed data not only on successful deals but also on unsuccessful ones.

One major direction in which our analysis may be extended would be adding smoothness to the value functions by customers. Another potential future direction would be to extend the analysis from a monopolistic setting to an oligopolistic one.

Appendix

This appendix provides multiple complementary analyses to the paper. Section A provides the proof to lemma 1. Section B analyzes the robustness of the results to our approximation whereby we treated the observed price schedule by LifeLabs as concave. In another robustness check, section C studies the optimal price schedule under the assumption that the gross valuation functions $V_{i}(\cdot)$ are smooth rather than piecewise linear. Section D demonstrates the robustness of our results to model specification. Section E sets out our grid-bisection optimization method, compares it to other alternatives, and provides recommendations on algorithm choice for researchers studying similar problems. Section F describes the process by which our counterfactual data (visually depicted in figure 12 of section 7) are generated. Section G carries out a number of counterfactual analyses in which we study the impact of 3rd-degree price discrimination as well as a combination of 2nd and 3rd degree discrimination strategies.

Appendix A Proof of Lemma 1

As a reminder:

[TABLE]

where $V(q|\bar{q},v):=\min(q,\bar{q})\times v$ .

Note that $V(q|\bar{q},v)$ is constant in $q$ for $q\geq\bar{q}$ whereas $P(q)$ is strictly increasing. Thus, no quantity $q>\bar{q}$ can be in the arg max. As a result, we can rewrite:

[TABLE]

But within the $[0,\bar{q}]$ interval, the value function can be written as: $V(q|\bar{q},v):=q\times v$ which yields:

[TABLE]

Note that by strict concavity of $P(q)$ and linearity of $q\times v$ , the function $q\times v-P(q)$ is strictly convex. This implies that its global maximum on the interval has to be an extreme point of the interval. That is: $q^{*}(P|\bar{q},v)\subset\{0,\bar{q}\}$ . Q.E.D.

Appendix B Robustness to the treatment of observed schedule as increasing and concave

In estimating the model, we leveraged Lemma 1 which stated that the observed $q$ for each customer $i$ would have to equate $\bar{q}_{i}$ if the observed price schedule $P(\cdot)$ charged by the firm is weakly concave. As Figure 1 depicts, however, the observed price schedule is not concave. It does have a decreasing slope but it also have some discontinuous downward jumps. In this appendix, we show that the schedule is indeed approximately concave. That is, these jumps are sufficiently small for the result to be robust to them.

More precisely, we re-estimate the model and the optimal pricing analysis without using the data points $i$ for whom the observed $q_{i}$ may be different from $\bar{q}_{i}$ due to non-concavities in the price schedule. Those are basically all data points that fall on the “dips” of the observed schedule, i.e., those observations $i$ with $q_{i}$ such that $\exists q<q_{i}$ with $P(q)>P(q_{i})$ , where $P(\cdot)$ is the observed price schdeule.

This new sample is slightly smaller than the original one (2,585 datapoints instead of 2,686). Based on this sample, the optimal price schedule and the profitability measures are re-computed and depicted in figure 17 and table 10.

As can be seen from comparing these results to the corresponding ones from the main text, there is little difference caused by the few points for the observed purchase size and customer size differ.272727Perhaps the most notable difference is that profit and revenue levels are generally slightly lower here compared to the results from the main text. This should not be surprising given that by removing some buyers from the sample, we are also removing the profit and revenue they generate for the seller. This is especially the case for the optimal price schedule, where the differences are not even large enough to be discernible when visually plotted.

Appendix C Robustness to value-function specifications with smoothly diminishing return

As mentioned in the main text of the paper, we make the simplifying assumption that value functions take the form $V_{i}(q)\equiv v_{i}\times\min(q,\bar{q}_{i})$ . In this section, we explore smoothing these value functions to examine the effects of such smoothness on the shape of the optimal schedule as well as the welfare effects.

Formally, we modify the value function from $V_{i}(q)\equiv v_{i}\times\min(q,\bar{q}_{i})$ to:

[TABLE]

In this formulation, $\alpha\in(0,1]$ allows the value to exhibit diminishing return as $q$ increases. The multiplier $\zeta$ does not carry any economic meaning, as it could be absorbed in to $v_{i}$ and we would still have the same general formulation. We incorporate it, however, to make the above value function and the original piece-wise-linear one “comparable”. More specifically, $\zeta$ is chosen so that the smoothed and original value functions yield the same value when evaluated at $q=\bar{q}_{i}$ :

[TABLE]

Figure 18 illustrates one such smoothed value function when $\bar{q}=200,v=1,$ and $\alpha=0.75$ .

We next turn to an analysis of how sensitive our results will be to $\alpha$ . Figure 19 shows the optimal linear and nonlinear price schedules as a function of the smoothness factor $\alpha$ . As can be seen from this figure, the results are robust to the smoothness factor in the range $[0.75,1]$ . Note that even though $\alpha=1$ has not been analyzed in a separate subfigure here, the results from this case have already been examined in the paper. This is because the $\alpha=1$ special case coincides with our original, piece-wise linear, utility function.

Appendix D Robustness to other model specifications

In this section, we examine the sensitivity of the results to some model specifications. The number of alternative specifications one can potentially estimate is prohibitively large. As a result, we focus on a small set of alternatives. In particular, we examine robustness along two dimensions. First, we study how the model estimates and simulated optimal pricing policy would change if we changed the set of behavioral features or incorporated number of employees as another independent variable. Second, we check robustness to a specification in which instead of three distinct size groups, we have five size groups. This latter specification is formally implemented by changing the fixed effects model for sizes. Table 11 summarizes the estimation results across these alternative specifications. Note that the first column is the same as the default specification in the model, and has been presented for ease of comparison.

Figure 20 is complementary to Table 11 in that instead of regression results, it visually compares the estimated joint distribution $f(\cdot,\cdot)$ across model specifications.

Table 11 and Figure 20 suggest that our estimation results are robust across specifications. In order to complete the robustness analysis, however, it would add value to examine the sensitivity of our key counterfactual analysis result (i.e., the shape of the optimal nonlinear price schedule) to the specification. Table 12 accomplishes this.

As this table shows, the shape of the optimal contract remains robust to the model specification.

Appendix E Optimization Method

Though mechanism design theory does provide tractable methods to compute optimal nonlinear tariffs, all of those methods are devised under assumptions that seldom hold empirically.282828For instance, methods provided in Mussa and Rosen (1978); Maskin and Riley (1984) and similar papers all rely on a “single-crossing” condition which make the customer heterogeneity one-dimensional, an assumption that is clearly violated in our context (for instance in the left panel of Figure 5). As a result, we turn to numerical approaches to find the optimal schedule.

Our method: “Grid-Bisection”. Our problem has some features that are crucial in determining what method is used for optimization. First, the number of dimensions (i.e., five) is non-trivial but not too large. Second, we seek to find the global maximum as opposed to a local one. Third, each instance of computing the objective function (i.e., the profit) is costly. Fourth, the objective function is expected to behave non-smoothly, and there are no guarantees on concavity/linearity features that one could leverage.

Under the above circumstances, the optimization literature recommends the use of Bayesian methods (see Frazier (2018a, b) for an overview). We found, however, that a multi-dimensional variant of bisection search outperforms a Bayesian approach in the sense of delivering a higher objective-function value in a shorter amount of time. In this appendix, we briefly describe our method. We also comparatively analyze multiple alternatives (a variant of Bayesian Optimization and multiple versions of gradient descent) and provide a summary on which approach we think is the most appropriate one under different problem settings.

In a nutshell, our method starts with a 5-dimensional grid of all the five possible prices $(p_{1},...,p_{5})$ described in the previous subsection. Each grid is of size $d$ , which implies that we initially evaluate the profit function $\pi$ under $d^{5}$ possible values for vector $P$ . The values for each $p_{k}$ ( $k\in\{1,...,5\}$ ) in this initial grid are chosen to be equi-distant points on the interval $[0,5000]$ . In other words, the lower and upper bounds in iteration-1 of the algorithm for each $p_{k}$ are given by $\underline{p}^{1}_{k}=0$ and $\bar{p}^{1}_{k}=5000$ . Denote the length of the interval on the $k$ -th dimension by $l_{k}^{1}=\bar{p}^{1}_{k}-\underline{p}^{1}_{k}$ .292929Note that by our choices of the $\underline{p}^{1}_{k}$ and $\bar{p}^{1}_{k}$ values, all $l^{1}_{k}$ are equal to each other; but this need not be the case generally.

We then proceed with an iterative process. In each iteration $t$ , we first find the optimal price vector $P^{t*}=(p^{t*}_{1},...,p^{t*}_{5})$ among the $d^{5}$ candidates in our price grid. Next, we form a new grid (for the next iteration) by “zooming in” on $P^{t*}$ . More precisely, the new grid is constructed based on $P^{t*}$ and is a zoom factor $z\in(0,1)$ in the following manner:

[TABLE]

In words, the new bounds $\underline{p}^{t+1}_{k}$ and $\bar{p}^{t+1}_{k}$ form an interval of length $z\times l^{t}_{k}$ centered around $p^{t*}_{k}$ , unless this interval itself falls partially outside of the old bounds $\underline{p}^{t}_{k}$ and $\bar{p}^{t}_{k}$ (in which case the new interval is moved until it falls just within the old one). We then iterate and construct smaller and smaller intervals until we reach $t$ such that all $l^{t}_{k}$ are smaller than a pre-determined threshold $\varepsilon$ , at which point the algorithm stops.

Though we are not aware of a systematic analysis of the properties and performance of this grid-bisection approach, we are aware that variants of it have been previously used in different fields. One example is Yin et al. (2020), who also provide a visual illustration of their variation. Following their illustration, Figure 21 schematically describes our grid-binary optimization procedure.

The algorithm involves $d^{5}$ instances of computing the profit function $\pi$ in each iteration. It approximately takes $\frac{\log(l^{1}_{k})-\log(\varepsilon)}{-\log(z)}$ iterations for the algorithm to stop. We choose $d=5$ and $z=\frac{1}{2}$ for our application.

Comparison Against Alternative Methods. With this grid-bisection optimization method in hand, we next turn to a comparison between it and other alternatives we considered. In particular, we examine the following three approaches: 1) Bayesian Optimization a la Frazier (2018a, b), 2) Gradient method with the initial candidate for $P^{*}$ chosen from an initial grid of $5^{d}$ possible $P\in[0,3000]^{d}$ vectors, and 3) A simplex-based direct search method a la Nelder and Mead (1965) with the initial candidate for $P^{*}$ chosen from an initial grid of $10^{d}$ possible $P\in[0,3000]^{d}$ vectors. We do this comparison in three scenarios: (i) solving for the optimal linear price (d=1), (ii) solving for the optimal price when we have three marginal costs for small (i.e., $\bar{q}_{i}\in I_{1}\cup I_{2}$ ), medium-size (i.e., $\bar{q}_{i}\in I_{3}$ ), and large (i.e., $\bar{q}_{i}\in I_{4}\cup I_{5}$ )customers, i.e., $d=3$ , and finally (iii) the optimal price $P^{*}$ in our main 2nd-degree price discrimination setting which was five-dimensional $d=5$ . Table 13 presents the results. It presents both the objective value (i.e., profit) and runtime of each algorithm. Note that in addition to the methods whose performances are reported in this table, we examined a number of Gradient-method alternatives, such as “conjugate gradient”, “BFGS”, “Truncated newton algorithm”, and “Powell algorithm”. However, due to weak performance, we do not report them here.

Note that this analysis has been performed on an earlier specification of the model. This is why the optimal schedule is slightly different from what can be found in the main specification presented in the paper.

Discussion and Recommendations. Based on the results presented in table 13, the grid-bisection approach consistently outperforms other candidates with respect to the objective function. It also outperforms the Bayesian method on the front of the algorithm run-time. The comparison between Grid-Bisection and the Nelder Mead simplex approach (Nelder and Mead, 1965) is more nuanced. In the case of $d=5$ , which is the main case we analyze, Grid-Binary takes shorter to run than Nelder-Mead with an initial grid of $10^{5}$ but takes much longer if compared instead to Nelder-Mead with an initial grid of size $5^{5}$ .

Given the above summary, we recommend that Grid-Bisection be used for similar empirical multi-dimensional-screening problems if (i) precision is of utmost importance, and (ii) the number of dimensions is not too high (e.g., $d=5$ ). For higher numbers of dimensions (e.g., $d=10$ or larger), we conjecture that the Nelder-Mead approach with an initial grid that is not too large will make the best trade-off between accuracy and runtime.

Appendix F Details on data simulation

This section provides details on how the counterfactual datasets for the middle and right-most columns of Figure 12 were produced.

To produce the counterfactual data in the middle column, we alter the original data in a way that increases the deal acceptance rate in the for mid-size customers (i.e., $\bar{q}_{i}\in I_{3}$ ) while decreasing the acceptance rates for smaller ( $\bar{q}_{i}\in I_{1}\cup I_{2}$ ) and larger ( $\bar{q}_{i}\in I_{4}\cup I_{5}$ ) ones. More specifically, for each row of the data (i.e., each $it$ combination), we alter the observed value $s_{it}$ for deal success status with probability $p$ (i.e., according to a random draw from a Bernoulli distribution with parameter $p$ ). If the Bernoulli draw is 1, (that is, if we are set to alter $s_{it}$ ), then we alter it to $1$ if customer $i$ is mid-size and to 0 otherwise. If the Bernoulli draw is 0, however, we do not alter $s_{it}$ .

The procedure for generating the counterfactual data for the right-most column of the figure is similar, except that the $s_{it}$ values for mid-size deals are altered to 0 and those for small and large deals are altered to 1. Another difference between the generating processes for the middle and right-most columns of Figure 12 is that we set $p=0.7$ for the former and $p=0.6$ for the latter.

The algorithm below formally describes the process.

Appendix G Third Degree Price Discrimination

In this appendix, we study third-degree price discrimination. In particular, we consider two kinds: (i) third-degree price discrimination based on customer sizes, and (ii) combining second-degree price discrimination (based on size) with third-degree discrimination (based on other observables).

G.1 Third-degree discrimination based on customer size

In the discussion of incentive compatibility constraints, we introduced the “individually optimized” contract $\tilde{P}$ . We argued this price schedule is “naive” in that it fails to anticipate the ability by each customer of size $\bar{q}_{i}\in I_{k}$ to tune its purchase size in order to take advantage of other, lower, prices $p_{k^{\prime}}$ . In this section, we ask “what if such anticipation is indeed correct?” In other words, what if the seller is able to force customers of each size group $k$ to only pay $p_{k}$ per unit no matter how many units $q$ they purchase? This means the firm can third-degree price discriminate based on size $\bar{q}$ and need not worry about incentive compatibility. Under these conditions, $\tilde{P}_{k}$ would indeed become the true optimal price schedule. The profit here, will be equal to $\Sigma_{k}\pi_{k}(\tilde{p}_{k})$ where “local” profit functions $\pi_{k}(\cdot)$ are defined as in equation 7.

Note that due to the relaxation of the incentive constraints, the third-degree-discrimination profit $\Sigma_{k}\pi_{k}(\tilde{p}_{k})$ is larger than $\pi(\tilde{P})$ and, likely, also than the second-degree-discrimination profit $\pi(P^{*})$ . Table 14 empirically analyzes these profits under the three data scenarios described in Figure 12 (recall that one scenario is the original data and the other two are counterfactual datasets, modifying the demand system).

As the left column of this table shows, size-based third degree price discrimination delivers little to no extra profitability above second-degree discrimination (only by 0.16%). This should not be too surprising given that the main advantage of size-based third-degree discrimination is the relaxation of incentive-compatibility constraints. But as first reported in figure 12 and repeated in the middle row of table 14, in the original dataset the profit impact of incentive compatibility constraints was too slim (less than 0.1%). Looking at alternative data scenarios (mid and right columns in Table 14) confirm our intuition: the benefit of third-degree price discrimination based on size is large (small) if the loss from charging the individually optimized contract instead of the jointly optimized one is large (small). Formally, there seems to be a strong and positive association between $\pi(P^{*})-\pi(\tilde{P})$ on the one hand and $\big{(}\Sigma_{k}\pi_{k}(\tilde{p}_{k})\big{)}-\pi(P^{*})$ on the other.

G.2 Combining second and third degree discrimination

Another approach to third degree price discrimination would be to combine it with second degree. More precisely, the firm can rank customers based on their $\beta X_{i}$ from equation 5 in a descending order and create equally-sized groups $j\in\{1,...,J\}$ . The firm can then offer each group $j$ a separate optimal price schedule $P^{*j}(\cdot)$ . Given the specification of $\beta$ , this means customers who were established earlier, or those with higher amounts of behavioral feature 1 will be put in higher-ranked bins and face higher prices.303030Of course many other customer characteristics can be incorporated into this. But as mentioned before, in our specific contexts, many potentially relelvant features turned out to be of little impact empirically.. Figure 22 shows the optimal strategy if the firm divides all the customers into $J=2$ groups based on ranked $X_{i}\beta$ values. The original (i.e., one-group) optimal schedule $P^{*}$ has also been plotted for comparison. As can be seen from the figure, the higher-willingness-to-pay group $j=1$ gets charged an additional $200 or more per workshop (across different sizes) relative to the lower-willingness-to-pay group$ j=2$.

We close this section by a discussion of the profitability of third-degree price discrimination. Figure 23 plots the profitability of the optimal pricing as a function of the “extent of third degree discrimination $J$ ”. Under purely second-degree discrimination ( $J=1$ ), total profit is $\$ 19.04/y $whereas under substantial third-degree discrimination$ J=7 $, the profitability is around$ $20.49M/y$. This profit surpasses the second-degree-discrimination profit by only 7.56%. As a result, third-degree discrimination, though non-trivially useful when deployed alone or in conjunction with second-degree, does not seem to generate substantial extra profitability above sole second-degree discrimination.

We do not find the less-than-stellar performance of third-degree price discrimination in our context surprising. This is because as mentioned before in the process of variable selection for the modeling of $v$ (see equation 5), we found that many potentially relevant variables (especially industry fixed effects) have little explanatory power. We do not expect this empirical observation to be generalizable. In other contexts where observable customer characteristics are highly predictive of purchasing power (see Dubé and Misra (2017) for instance), third degree price discrimination may even closely approximate the profitability of first-degree discrimination.

Bibliography51

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Anderson and Dana Jr (2009) Anderson, E. T. and J. D. Dana Jr (2009). When is price discrimination profitable? Management Science 55 (6), 980–989.
2Armstrong (1996) Armstrong, M. (1996). Multiproduct nonlinear pricing. Econometrica: Journal of the Econometric Society , 51–75.
3Aryal and Gabrielli (2020) Aryal, G. and M. F. Gabrielli (2020). An empirical analysis of competitive nonlinear pricing. International Journal of Industrial Organization 68 , 102538.
4Bodoh-Creed et al. (2023) Bodoh-Creed, A. L., B. R. Hickman, J. A. List, I. Muir, and G. K. Sun (2023). Stress testing structural models of unobserved heterogeneity: Robust inference on optimal nonlinear pricing. Technical report, National Bureau of Economic Research.
5Carroll (2017) Carroll, G. (2017). Robustness and separation in multidimensional screening. Econometrica 85 (2), 453–488.
6Chan et al. (2009) Chan, T., V. Kadiyali, and P. Xiao (2009). Structural models of pricing. In Handbook of pricing research in marketing , pp. 108–131. Edward Elgar Publishing.
7Chan (2006) Chan, T. Y. (2006). Estimating a continuous hedonic-choice model with an application to demand for soft drinks. The Rand journal of economics 37 (2), 466–482.
8Cohen et al. (2016) Cohen, P., R. Hahn, J. Hall, S. Levitt, and R. Metcalfe (2016). Using big data to estimate consumer surplus: The case of uber. Technical report, National Bureau of Economic Research.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Abstract

1 Introduction

2 Related Literature

3 Data, Setting, and Descriptive Statistics

4 Model

5 Estimation

Lemma 1**.**

5.1 Estimation Results

5.1.1 Demand Side Parameters

5.1.2 Cost Parameters

5.2 Model Fit

6 Optimal Nonlinear Pricing Scheme

6.1 Optimal Price Schedule

6.2 Profit and Welfare Analysis

7 Further Counterfactual Analyses

7.1 Analysis of Demand-Side Factors

The Role of Incentive Compatibility Constraints

Simulation and Analysis of Different Demand Conditions:

7.2 Analysis of Cost-Side Factors

7.2.1 Cost- v.s. demand-side factors: a comparative analysis

7.2.2 Detailed analysis of cost-Side factors

7.3 Fixed Fees

Additional counterfactual analyses on 3rd-degree price discrimination

8 Discussion

8.1 Alternative Estimation Method for Experimental Data

8.2 General Applicability

8.3 Limitations and Future Research

8.3.1 Smoothly diminishing rates of return to customers and other dimensions of heterogeneity

8.3.2 Endogenous set of potential customers

8.3.3 Competition

9 Conclusion

Appendix

Appendix A Proof of Lemma 1

Appendix B Robustness to the treatment of observed schedule as increasing and concave

Appendix C Robustness to value-function specifications with smoothly diminishing return

Appendix D Robustness to other model specifications

Appendix E Optimization Method

Appendix F Details on data simulation

Appendix G Third Degree Price Discrimination

G.1 Third-degree discrimination based on customer size

G.2 Combining second and third degree discrimination

Lemma 1.