Direction Selection in Stochastic Directional Distance Functions
Kevin Layer, Andrew L. Johnson, Robin C. Sickles, Gary D. Ferrier

TL;DR
This paper investigates how the choice of direction in stochastic directional distance functions affects estimation accuracy, demonstrating that data-driven, orthogonal directions improve functional estimates in production and cost modeling.
Contribution
It introduces a data-driven approach for selecting optimal directions in SDDF, including shape constraints, enhancing estimation accuracy over traditional methods.
Findings
Orthogonal directions yield better estimates in simulations.
Shape constrained nonparametric methods outperform parametric ones.
Practitioners should choose directions with non-zero components for all variables.
Abstract
Researchers rely on the distance function to model multiple product production using multiple inputs. A stochastic directional distance function (SDDF) allows for noise in potentially all input and output variables. Yet, when estimated, the direction selected will affect the functional estimates because deviations from the estimated function are minimized in the specified direction. The set of identified parameters of a parametric SDDF can be narrowed via data-driven approaches to restrict the directions considered. We demonstrate a similar narrowing of the identified parameter set for a shape constrained nonparametric method, where the shape constraints impose standard features of a cost function such as monotonicity and convexity. Our Monte Carlo simulation studies reveal significant improvements, as measured by out of sample radial mean squared error, in functional estimates when…
| Avg MSE: Comparison | |||||
| to the True Function | |||||
| DDF Angle | |||||
| MSE Dir Angle | |||||
| 2.09 | 0.75 | 0.56 | 1.16 | 3.68 | |
| 1.36 | 0.46 | 0.32 | 0.63 | 1.89 | |
| 1.25 | 0.41 | 0.28 | 0.51 | 1.48 | |
| 1.59 | 0.50 | 0.32 | 0.57 | 1.60 | |
| 3.06 | 0.91 | 0.55 | 0.92 | 2.44 | |
| Note: Displayed are measured values multiplied by . | |||||
| Avg MSE: Comparison | |||||
| to Out-of-Sample | |||||
| DDF Angle | |||||
| MSE Dir Angle | |||||
| 28.28 | 29.43 | 31.29 | 34.23 | 40.67 | |
| 18.03 | 17.79 | 18.19 | 19.09 | 21.32 | |
| 16.38 | 15.55 | 15.45 | 15.77 | 16.90 | |
| 20.50 | 18.67 | 18.04 | 17.90 | 18.46 | |
| 38.63 | 33.07 | 30.68 | 29.29 | 28.70 | |
| Note: Displayed are measured values multiplied by . | |||||
| CNLS-d Direction Angle | |||||
| Average MSE across simulations | 13.90 | 4.65 | 3.32 | 4.49 | 13.93 |
| Note: Displayed are measured values multiplied by . | |||||
| CNLS-d Direction Angle | |||||
| Noise Direction Angle | |||||
| 2.69 | 3.03 | 4.49 | 8.86 | 25.47 | |
| 7.49 | 3.44 | 4.00 | 8.07 | 28.83 | |
| 20.28 | 5.79 | 4.30 | 5.80 | 19.06 | |
| 25.58 | 7.80 | 4.18 | 3.51 | 6.84 | |
| 25.90 | 9.09 | 4.73 | 3.10 | 2.57 | |
| Note: Displayed are measured values multiplied by . | |||||
| CNLS-d Direction Angle | |||||
| Noise Direction Angle | |||||
| 0.92 | 0.82 | 0.96 | 1.53 | 5.12 | |
| 1.83 | 1.09 | 1.09 | 1.47 | 5.45 | |
| 3.70 | 1.41 | 1.29 | 1.43 | 3.93 | |
| 5.75 | 1.68 | 1.27 | 1.18 | 1.86 | |
| 4.61 | 1.40 | 0.95 | 0.79 | 0.90 | |
| Note: Displayed are measured values multiplied by . | |||||
| Mean of the | CNLS-d Direction angle | ||||
|---|---|---|---|---|---|
| Normal Distribution () | |||||
| 3.19 | 2.21 | 3.89 | 10.28 | 46.47 | |
| 8.44 | 2.92 | 1.98 | 3.17 | 9.00 | |
| 45.64 | 10.25 | 4.02 | 2.43 | 3.07 | |
| Note: Displayed are measured values multiplied by . | |||||
| 2007 | |||||
|---|---|---|---|---|---|
| (523 observations) | |||||
| Cost ($) | MajDiag | MajTher | MinDiag | MinTher | |
| Mean | 146M | 162 | 4083 | 3499 | 7299 |
| Skewness | 3.51 | 2.89 | 2.63 | 5.19 | 3.28 |
| 25-percentile | 24M | 9 | 277 | 108 | 512 |
| 50-percentile | 72M | 73 | 1688 | 938 | 3108 |
| 75-percentile | 182M | 207 | 5443 | 4082 | 9628 |
| 2008 | |||||
| (511 observations) | |||||
| Cost ($) | MajDiag | MajTher | MinDiag | MinTher | |
| Mean | 163M | 175 | 4433 | 3688 | 7657 |
| Skewness | 4.19 | 3.80 | 2.97 | 4.87 | 2.82 |
| 25-percentile | 28M | 10 | 325 | 120 | 545 |
| 50-percentile | 83M | 76 | 1809 | 1013 | 3350 |
| 75-percentile | 189M | 246 | 5984 | 4569 | 10781 |
| 2009 | |||||
| (458 observations) | |||||
| Cost ($) | MajDiag | MajTher | MinDiag | MinTher | |
| Mean | 175M | 161 | 4471 | 3615 | 7905 |
| Skewness | 3.39 | 3.78 | 2.43 | 4.68 | 2.41 |
| 25-percentile | 31M | 12 | 420 | 148 | 713 |
| 50-percentile | 91M | 69 | 1737 | 1136 | 3458 |
| 75-percentile | 220M | 230 | 6402 | 4694 | 10989 |
| Direction | Year | ||
|---|---|---|---|
| ( | 2007 | 2008 | 2009 |
| (0.45, 0.45, 0.45, 0.45, 0.45) | 2.10 | 1.30 | 1.50 |
| (0.35, 0.35, 0.35, 0.35, 0.71) | 2.15 | 1.65 | 1.29 |
| Median Direction | 1.79 | 1.55 | 1.34 |
| Note: Displayed are the measured values | |||
| multiplied by | |||
| Quadratic | CNLS-d | Lower Bound | |
| Year | Regression | (Median Direction) | Estimator |
| 2007 | 3.43 | 2.44 | 2.35 |
| 2008 | 2.76 | 1.93 | 1.48 |
| 2009 | 2.43 | 1.80 | 1.53 |
| Note: The MSE values displayed are the measured | |||
| values multiplied by | |||
| Ratio | Quadratic Regression | CNLS-d (median) | CNLS-d (equal) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| MajTher/MinTher | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 |
| 20% | 13 | 379 | 252 | 210 | 61 | 88 | 224 | 137 | 106 |
| 30% | 17 | 861 | 640 | 146 | 66 | 83 | 134 | 129 | 148 |
| 40% | 272 | 377 | 1090 | 107 | 56 | 77 | 127 | 85 | 135 |
| 50% | 870 | 249 | 1552 | 112 | 64 | 85 | 124 | 126 | 134 |
| 60% | 360 | 210 | 276 | 90 | 70 | 120 | 88 | 96 | 142 |
| 70% | 205 | 182 | 187 | 111 | 66 | 184 | 132 | 104 | 104 |
| 80% | 151 | 170 | 150 | 174 | 69 | 286 | 221 | 110 | 111 |
| Note: The values displayed are in $M | |||||||||
| Percentile | Quadratic Regression | CNLS-d (median) | CNLS-d (equal) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MinTher | MajTher | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 |
| 25 | 25 | 8.9 | 6.5 | 13.2 | 0.03 | 0.03 | 0.03 | 0.2 | 0.02 | 0.1 |
| 25 | 50 | 8.9 | 6.5 | 13.2 | 0.05 | 0.1 | 0.1 | 0.04 | 0.1 | 0.04 |
| 25 | 75 | 8.9 | 6.5 | 13.2 | 0.2 | 0.04 | 0.03 | 0.1 | 0.02 | 0.02 |
| 50 | 25 | 8.1 | 6.1 | 12.4 | 6.9 | 5.5 | 7.4 | 5.9 | 6.3 | 7.8 |
| 50 | 50 | 8.1 | 6.1 | 12.4 | 4.3 | 4.9 | 7.8 | 2.1 | 3.7 | 7.4 |
| 50 | 75 | 8.1 | 6.1 | 12.4 | 0.2 | 0.4 | 0.03 | 0.1 | 0.02 | 0.02 |
| 75 | 25 | 6.0 | 5.0 | 10.4 | 9.6 | 13.5 | 14.0 | 9.5 | 10.9 | 14.1 |
| 75 | 50 | 6.0 | 5.0 | 10.4 | 9.6 | 13.5 | 14.3 | 9.6 | 10.9 | 13.8 |
| 75 | 75 | 6.0 | 5.0 | 10.4 | 5.7 | 10.1 | 6.4 | 4.6 | 8.7 | 6.4 |
| Note: The values displayed are in $k | ||||||||||
| Percentile | Quadratic Regression | CNLS-d (median) | CNLS-d (equal) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MinTher | MajTher | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 |
| 25 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.04 | 0.1 | 0.2 | 0.03 | 0.1 |
| 25 | 50 | 11.7 | 13.0 | 10.8 | 11.3 | 11.8 | 15.7 | 10.5 | 10.3 | 14.6 |
| 25 | 75 | 15.1 | 17.2 | 14.5 | 19.8 | 22.1 | 24.6 | 19.8 | 21.8 | 24.0 |
| 50 | 25 | 10.5 | 11.5 | 9.8 | 0.4 | 0.2 | 0.5 | 0.1 | 0.1 | 0.4 |
| 50 | 50 | 11.7 | 13.0 | 10.8 | 3.7 | 7.7 | 1.7 | 6.9 | 7.1 | 3.7 |
| 50 | 75 | 15.1 | 17.2 | 14.5 | 19.8 | 22.0 | 24.6 | 19.8 | 21.8 | 24.0 |
| 75 | 25 | 10.5 | 11.5 | 9.8 | 0.2 | 0.03 | 0.1 | 0.0 | 0.1 | 0.1 |
| 75 | 50 | 11.7 | 13.0 | 10.8 | 0.2 | 0.2 | 0.4 | 0.8 | 0.1 | 0.3 |
| 75 | 75 | 15.1 | 17.2 | 14.5 | 18.3 | 12.4 | 19.8 | 16.2 | 11.0 | 15.2 |
| Note: The values displayed are in $k | ||||||||||
| Average MSE: Estimator | ||||||
| compared to the true function | ||||||
| DDF Direction Angle | ||||||
| Noise Dir Angle | MSE Dir Ang | |||||
| 0.55 | 1.59 | 3.49 | 6.35 | 12.06 | ||
| 0.32 | 0.86 | 1.81 | 3.17 | 5.70 | ||
| 0.27 | 0.69 | 1.42 | 2.44 | 4.23 | ||
| 0.32 | 0.77 | 1.54 | 2.58 | 4.36 | ||
| 0.54 | 1.21 | 2.37 | 3.86 | 6.28 | ||
| 3.22 | 1.00 | 2.66 | 7.79 | 22.92 | ||
| 2.16 | 0.59 | 1.39 | 3.80 | 9.98 | ||
| 2.04 | 0.50 | 1.10 | 2.88 | 7.09 | ||
| 2.67 | 0.59 | 1.21 | 3.02 | 7.02 | ||
| 5.40 | 1.03 | 1.88 | 4.45 | 9.68 | ||
| 8.95 | 2.92 | 1.18 | 2.95 | 15.94 | ||
| 6.46 | 1.93 | 0.70 | 1.53 | 7.21 | ||
| 6.49 | 1.81 | 0.61 | 1.20 | 5.24 | ||
| 9.10 | 2.35 | 0.74 | 1.31 | 5.30 | ||
| 20.84 | 4.70 | 1.32 | 2.03 | 7.48 | ||
| 9.65 | 4.44 | 1.90 | 1.13 | 5.70 | ||
| 6.99 | 3.00 | 1.22 | 0.65 | 2.83 | ||
| 7.05 | 2.86 | 1.11 | 0.55 | 2.17 | ||
| 9.92 | 3.76 | 1.40 | 0.64 | 2.30 | ||
| 22.76 | 7.71 | 2.66 | 1.09 | 3.45 | ||
| 6.15 | 3.76 | 2.29 | 1.16 | 0.50 | ||
| 4.25 | 2.50 | 1.49 | 0.73 | 0.29 | ||
| 4.11 | 2.36 | 1.37 | 0.66 | 0.25 | ||
| 5.52 | 3.06 | 1.74 | 0.81 | 0.29 | ||
| 7 | 11.62 | 6.10 | 3.33 | 1.50 | 0.49 | |
| Note: Displayed are the measured values multiplied by | ||||||
| Average MSE: Estimator | ||||||
| compared to testing set data | ||||||
| DDF Direction Angle | ||||||
| Noise Dir Angle | MSE Dir Ang | |||||
| 30.02 | 31.22 | 33.23 | 36.21 | 42.08 | ||
| 17.53 | 17.13 | 17.46 | 18.24 | 20.01 | ||
| 14.95 | 13.99 | 13.86 | 14.10 | 14.92 | ||
| 17.51 | 15.70 | 15.15 | 15.03 | 15.42 | ||
| 29.93 | 25.30 | 23.55 | 22.64 | 22.32 | ||
| 49.89 | 52.78 | 58.59 | 68.39 | 91.28 | ||
| 32.41 | 30.88 | 31.71 | 34.14 | 40.37 | ||
| 29.93 | 26.38 | 25.69 | 26.27 | 28.92 | ||
| 38.15 | 31.00 | 28.66 | 27.92 | 28.88 | ||
| 74.19 | 53.30 | 45.83 | 41.93 | 40.19 | ||
| 51.54 | 53.79 | 59.55 | 70.76 | 101.99 | ||
| 36.65 | 34.53 | 35.21 | 38.14 | 47.22 | ||
| 36.39 | 31.60 | 30.32 | 30.83 | 34.75 | ||
| 50.32 | 39.87 | 35.91 | 34.32 | 35.52 | ||
| 112.21 | 76.31 | 62.47 | 54.76 | 50.83 | ||
| 39.37 | 41.09 | 45.01 | 52.54 | 73.64 | ||
| 28.28 | 27.35 | 28.14 | 30.56 | 37.89 | ||
| 28.30 | 25.72 | 25.22 | 26.01 | 29.73 | ||
| 39.47 | 33.40 | 31.11 | 30.42 | 32.19 | ||
| 89.14 | 66.84 | 57.41 | 51.96 | 49.51 | ||
| 22.47 | 22.94 | 23.97 | 25.85 | 30.66 | ||
| 15.44 | 15.16 | 15.36 | 15.99 | 17.91 | ||
| 14.89 | 14.17 | 14.01 | 14.21 | 15.27 | ||
| 19.88 | 18.27 | 17.59 | 17.35 | 17.88 | ||
| 41.52 | 36.04 | 33.31 | 31.51 | 30.54 | ||
| Note: Displayed are the measured values multiplied by | ||||||
| CNLS-d Direction Angle | |||||
| Noise Direction Angle | |||||
| 8.15 | 15.62 | 37.66 | 82.16 | 183.39 | |
| 50.60 | 11.59 | 20.68 | 67.88 | 206.46 | |
| 145.21 | 29.40 | 11.89 | 33.89 | 149.24 | |
| 220.24 | 69.87 | 22.28 | 11.66 | 53.66 | |
| 165.84 | 72.13 | 33.27 | 14.25 | 7.41 | |
| Note: Displayed are measured values multiplied by | |||||
| CNLS-d Direction Angle | |||||
| Distribution | |||||
| 8.45 | 3.04 | 1.96 | 3.01 | 8.60 | |
| 29.34 | 6.92 | 3.27 | 2.54 | 3.39 | |
| 6.62 | 9.69 | 19.19 | 72.55 | 598.97 | |
| Note: Displayed are measured values multiplied by . | |||||
| CNLS-d Direction | |
|---|---|
| Average of radial MSE | |
| 9.07 | |
| 5.23 | |
| 5.04 | |
| 5.53 | |
| 9.62 | |
| 4.24 | |
| 4.29 | |
| 4.35 | |
| 5.12 | |
| 5.44 | |
| 4.21 | |
| 4.15 | |
| 4.18 | |
| 4.89 | |
| 4.91 | |
| 4.23 | |
| 5.20 | |
| 5.18 | |
| 8.58 | |
| Note: Displayed are measured values multiplied by . | |
| Percentile | Quadratic Regression | CNLS-d (median) | CNLS-d (equal) | LL Kernel | |||||||||||
| MinDiag | MinTher | MajDiag | MajTher | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 |
| 25 | 25 | 25 | 25 | 529 | 234 | 823 | 105 | 58 | 87 | 109 | 93 | 103 | 228 | 301 | 885 |
| 25 | 25 | 25 | 50 | 118 | 118 | 122 | 406 | 81 | 369 | 412 | 91 | 112 | 220 | 143 | 216 |
| 25 | 25 | 25 | 75 | 102 | 110 | 93 | 1214 | 82 | 895 | 1220 | 92 | 104 | 209 | 120 | 189 |
| 25 | 25 | 50 | 25 | 79 | 693 | 560 | 177 | 94 | 166 | 131 | 93 | 140 | 226 | 136 | 162 |
| 25 | 25 | 50 | 50 | 104 | 139 | 141 | 98 | 60 | 94 | 95 | 79 | 84 | 233 | 217 | 210 |
| 25 | 25 | 50 | 75 | 105 | 114 | 103 | 165 | 80 | 340 | 179 | 90 | 114 | 219 | 139 | 208 |
| 25 | 25 | 75 | 25 | 56 | 414 | 335 | 179 | 108 | 554 | 124 | 96 | 384 | 158 | 133 | 27 |
| 25 | 25 | 75 | 50 | 77 | 245 | 176 | 149 | 93 | 194 | 126 | 91 | 132 | 292 | 117 | 185 |
| 25 | 25 | 75 | 75 | 94 | 133 | 115 | 93 | 61 | 107 | 105 | 71 | 85 | 226 | 197 | 197 |
| 25 | 50 | 25 | 25 | 15 | 42 | 63 | 330 | 53 | 78 | 327 | 123 | 127 | 400 | 271 | 1062 |
| 25 | 50 | 25 | 50 | 1074 | 234 | 1333 | 119 | 55 | 92 | 108 | 91 | 89 | 1027 | 306 | 215 |
| 25 | 50 | 25 | 75 | 137 | 131 | 138 | 209 | 78 | 331 | 264 | 98 | 110 | 222 | 153 | 206 |
| 25 | 50 | 50 | 25 | 248 | 330 | 381 | 80 | 57 | 83 | 93 | 74 | 84 | 373 | 304 | 336 |
| 25 | 50 | 50 | 50 | 332 | 273 | 1349 | 70 | 64 | 86 | 127 | 78 | 80 | 903 | 251 | 1152 |
| 25 | 50 | 50 | 75 | 133 | 134 | 141 | 177 | 76 | 304 | 182 | 95 | 109 | 233 | 261 | 231 |
| 25 | 50 | 75 | 25 | 108 | 492 | 718 | 126 | 91 | 144 | 143 | 89 | 129 | 204 | 187 | 79 |
| 25 | 50 | 75 | 50 | 122 | 694 | 1068 | 128 | 87 | 137 | 144 | 95 | 112 | 331 | 159 | 239 |
| 25 | 50 | 75 | 75 | 118 | 154 | 152 | 91 | 59 | 104 | 110 | 77 | 93 | 239 | 232 | 229 |
| 25 | 75 | 25 | 25 | 11 | 13 | 13 | 915 | 53 | 80 | 1015 | 125 | 130 | 246 | 98 | 921 |
| 25 | 75 | 25 | 50 | 11 | 231 | 149 | 192 | 52 | 78 | 197 | 130 | 136 | 537 | 433 | 1129 |
| 25 | 75 | 25 | 75 | 1139 | 223 | 1542 | 112 | 55 | 75 | 101 | 91 | 115 | 1015 | 287 | 215 |
| 25 | 75 | 50 | 25 | 18 | 16 | 5 | 133 | 52 | 79 | 181 | 114 | 118 | 293 | 111 | 887 |
| 25 | 75 | 50 | 50 | 13 | 311 | 217 | 135 | 51 | 77 | 181 | 111 | 125 | 528 | 466 | 1091 |
| 25 | 75 | 50 | 75 | 1155 | 230 | 1563 | 109 | 61 | 75 | 99 | 90 | 114 | 1062 | 272 | 214 |
| 25 | 75 | 75 | 25 | 64 | 220 | 275 | 81 | 57 | 85 | 94 | 82 | 84 | 300 | 199 | 274 |
| 25 | 75 | 75 | 50 | 304 | 483 | 484 | 79 | 56 | 93 | 85 | 73 | 83 | 478 | 400 | 437 |
| 25 | 75 | 75 | 75 | 333 | 265 | 1532 | 77 | 64 | 96 | 115 | 78 | 79 | 963 | 249 | 153 |
| 50 | 25 | 25 | 25 | 143 | 189 | 126 | 165 | 115 | 149 | 173 | 157 | 183 | 132 | 139 | 123 |
| 50 | 25 | 25 | 50 | 119 | 124 | 105 | 126 | 68 | 143 | 110 | 88 | 98 | 287 | 116 | 197 |
| 50 | 25 | 25 | 75 | 103 | 111 | 90 | 289 | 82 | 424 | 265 | 92 | 104 | 218 | 116 | 185 |
| 50 | 25 | 50 | 25 | 84 | 740 | 157 | 136 | 72 | 258 | 140 | 91 | 277 | 128 | 209 | 93 |
| 50 | 25 | 50 | 50 | 106 | 146 | 124 | 96 | 59 | 100 | 101 | 85 | 113 | 245 | 244 | 202 |
| 50 | 25 | 50 | 75 | 106 | 114 | 100 | 173 | 80 | 292 | 212 | 90 | 113 | 229 | 135 | 211 |
| 50 | 25 | 75 | 25 | 58 | 431 | 217 | 205 | 97 | 452 | 119 | 95 | 440 | 161 | 128 | 7 |
| 50 | 25 | 75 | 50 | 79 | 247 | 160 | 140 | 82 | 192 | 114 | 91 | 154 | 150 | 114 | 150 |
| 50 | 25 | 75 | 75 | 95 | 133 | 111 | 93 | 61 | 106 | 104 | 79 | 106 | 233 | 207 | 197 |
| 50 | 50 | 25 | 25 | 10 | 142 | 207 | 99 | 51 | 75 | 107 | 111 | 112 | 462 | 363 | 1031 |
| 50 | 50 | 25 | 50 | 1156 | 232 | 1319 | 109 | 61 | 81 | 114 | 89 | 80 | 1134 | 367 | 264 |
| 50 | 50 | 25 | 75 | 138 | 131 | 135 | 208 | 78 | 240 | 267 | 98 | 110 | 233 | 150 | 212 |
| 50 | 50 | 50 | 25 | 357 | 387 | 450 | 87 | 56 | 90 | 91 | 80 | 90 | 419 | 324 | 194 |
| 50 | 50 | 50 | 50 | 307 | 272 | 1329 | 76 | 63 | 84 | 88 | 77 | 78 | 218 | 269 | 652 |
| 50 | 50 | 50 | 75 | 134 | 135 | 137 | 185 | 76 | 258 | 183 | 95 | 108 | 236 | 170 | 232 |
| 50 | 50 | 75 | 25 | 110 | 508 | 702 | 125 | 90 | 143 | 124 | 89 | 139 | 209 | 178 | 30 |
| 50 | 50 | 75 | 50 | 123 | 646 | 1044 | 128 | 77 | 147 | 119 | 94 | 132 | 333 | 340 | 178 |
| 50 | 50 | 75 | 75 | 119 | 155 | 149 | 91 | 59 | 103 | 110 | 77 | 103 | 240 | 236 | 236 |
| 50 | 75 | 25 | 25 | 18 | 15 | 6 | 274 | 53 | 80 | 282 | 124 | 130 | 291 | 117 | 933 |
| 50 | 75 | 25 | 50 | 14 | 245 | 142 | 191 | 52 | 77 | 188 | 129 | 126 | 566 | 456 | 1134 |
| 50 | 75 | 25 | 75 | 1155 | 224 | 1523 | 111 | 55 | 75 | 101 | 91 | 115 | 1050 | 348 | 247 |
| 50 | 75 | 50 | 25 | 18 | 13 | 10 | 132 | 52 | 79 | 172 | 114 | 118 | 316 | 140 | 894 |
| 50 | 75 | 50 | 50 | 17 | 325 | 209 | 135 | 51 | 76 | 164 | 111 | 124 | 537 | 502 | 680 |
| 50 | 75 | 50 | 75 | 1170 | 230 | 1544 | 109 | 61 | 83 | 106 | 90 | 114 | 1106 | 308 | 252 |
| 50 | 75 | 75 | 25 | 85 | 232 | 264 | 81 | 57 | 84 | 94 | 82 | 84 | 321 | 205 | 245 |
| 50 | 75 | 75 | 50 | 323 | 493 | 471 | 79 | 56 | 92 | 85 | 81 | 82 | 499 | 406 | 306 |
| 50 | 75 | 75 | 75 | 335 | 266 | 1514 | 77 | 64 | 95 | 115 | 78 | 79 | 966 | 252 | 1192 |
| 75 | 25 | 25 | 25 | 75 | 101 | 29 | 548 | 309 | 213 | 620 | 177 | 136 | 20 | 27 | 34 |
| 75 | 25 | 25 | 50 | 100 | 118 | 50 | 129 | 74 | 176 | 142 | 137 | 128 | 100 | 46 | 73 |
| 75 | 25 | 25 | 75 | 102 | 112 | 81 | 101 | 78 | 133 | 104 | 79 | 99 | 242 | 73 | 160 |
| 75 | 25 | 50 | 25 | 74 | 142 | 39 | 244 | 95 | 190 | 322 | 92 | 285 | 49 | 57 | 42 |
| 75 | 25 | 50 | 50 | 95 | 140 | 59 | 112 | 120 | 154 | 189 | 112 | 386 | 123 | 58 | 81 |
| 75 | 25 | 50 | 75 | 106 | 116 | 83 | 107 | 76 | 131 | 110 | 78 | 99 | 259 | 79 | 179 |
| 75 | 25 | 75 | 25 | 60 | 534 | 65 | 163 | 75 | 260 | 178 | 84 | 355 | 79 | 115 | 17 |
| 75 | 25 | 75 | 50 | 80 | 213 | 82 | 139 | 72 | 237 | 179 | 81 | 280 | 129 | 147 | 135 |
| 75 | 25 | 75 | 75 | 96 | 137 | 96 | 91 | 69 | 111 | 99 | 84 | 114 | 242 | 117 | 188 |
| 75 | 50 | 25 | 25 | 233 | 593 | 136 | 232 | 128 | 157 | 229 | 254 | 130 | 109 | 542 | 677 |
| 75 | 50 | 25 | 50 | 185 | 196 | 138 | 171 | 93 | 156 | 145 | 145 | 121 | 154 | 146 | 135 |
| 75 | 50 | 25 | 75 | 137 | 132 | 115 | 107 | 75 | 118 | 101 | 76 | 106 | 243 | 111 | 182 |
| 75 | 50 | 50 | 25 | 175 | 670 | 149 | 133 | 85 | 132 | 133 | 98 | 118 | 135 | 278 | 412 |
| 75 | 50 | 50 | 50 | 169 | 223 | 149 | 120 | 98 | 141 | 121 | 108 | 136 | 179 | 171 | 139 |
| 75 | 50 | 50 | 75 | 133 | 137 | 118 | 104 | 74 | 117 | 107 | 75 | 95 | 300 | 128 | 211 |
| 75 | 50 | 75 | 25 | 101 | 607 | 182 | 106 | 71 | 258 | 156 | 80 | 351 | 139 | 161 | 59 |
| 75 | 50 | 75 | 50 | 117 | 359 | 177 | 102 | 69 | 236 | 150 | 77 | 279 | 171 | 291 | 160 |
| 75 | 50 | 75 | 75 | 120 | 159 | 131 | 97 | 67 | 108 | 97 | 90 | 111 | 274 | 253 | 210 |
| 75 | 75 | 25 | 25 | 11 | 57 | 144 | 92 | 52 | 85 | 101 | 92 | 83 | 380 | 220 | 872 |
| 75 | 75 | 25 | 50 | 12 | 371 | 377 | 88 | 51 | 83 | 105 | 90 | 90 | 642 | 551 | 1051 |
| 75 | 75 | 25 | 75 | 740 | 219 | 452 | 101 | 53 | 81 | 86 | 88 | 88 | 253 | 359 | 237 |
| 75 | 75 | 50 | 25 | 13 | 136 | 202 | 88 | 51 | 84 | 105 | 90 | 92 | 404 | 284 | 866 |
| 75 | 75 | 50 | 50 | 19 | 439 | 428 | 93 | 50 | 82 | 109 | 89 | 90 | 631 | 630 | 1048 |
| 75 | 75 | 50 | 75 | 549 | 225 | 459 | 106 | 59 | 89 | 91 | 88 | 96 | 263 | 349 | 286 |
| 75 | 75 | 75 | 25 | 296 | 328 | 415 | 79 | 48 | 89 | 85 | 86 | 89 | 363 | 233 | 191 |
| 75 | 75 | 75 | 50 | 507 | 564 | 593 | 85 | 48 | 88 | 83 | 77 | 97 | 385 | 423 | 191 |
| 75 | 75 | 75 | 75 | 285 | 261 | 472 | 82 | 56 | 92 | 88 | 76 | 101 | 236 | 264 | 713 |
| Note: The values displayed are in $M | |||||||||||||||
| Percentile | Quadratic Regression | CNLS-d (median) | CNLS-d (equal) | LL Kernel | |||||||||||
| MinDiag | MinTher | MajDiag | MajTher | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 |
| 25 | 25 | 25 | 25 | 8.9 | 6.5 | 13.2 | 1.3 | 1.5 | 2.5 | 0.3 | 0.3 | 1.5 | 6.3 | 10.4 | 4.7 |
| 25 | 25 | 25 | 50 | 8.9 | 6.5 | 13.2 | 0.1 | 0.1 | 0.0 | 0.3 | 0.1 | 0.1 | 6.1 | 9.9 | 3.8 |
| 25 | 25 | 25 | 75 | 8.9 | 6.5 | 13.2 | 0.1 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 | 5.1 | 7.8 | 4.2 |
| 25 | 25 | 50 | 25 | 8.9 | 6.5 | 13.2 | 0.0 | 0.1 | 0.4 | 0.0 | 0.0 | 0.6 | 6.5 | 10.7 | 5.9 |
| 25 | 25 | 50 | 50 | 8.9 | 6.5 | 13.2 | 0.1 | 0.1 | 0.2 | 0.0 | 0.1 | 0.5 | 6.4 | 10.2 | 5.1 |
| 25 | 25 | 50 | 75 | 8.9 | 6.5 | 13.2 | 0.2 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 | 5.5 | 8.0 | 4.6 |
| 25 | 25 | 75 | 25 | 8.9 | 6.5 | 13.2 | 0.0 | 0.1 | 0.1 | 0.1 | 0.0 | 0.6 | 6.8 | 10.0 | 8.2 |
| 25 | 25 | 75 | 50 | 8.9 | 6.5 | 13.2 | 0.0 | 0.1 | 0.0 | 0.0 | 0.0 | 0.2 | 6.8 | 9.6 | 7.6 |
| 25 | 25 | 75 | 75 | 8.9 | 6.5 | 13.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.1 | 5.9 | 7.8 | 6.4 |
| 25 | 50 | 25 | 25 | 8.1 | 6.1 | 12.4 | 7.3 | 8.7 | 10.3 | 8.0 | 8.1 | 9.6 | 5.0 | 10.7 | 6.2 |
| 25 | 50 | 25 | 50 | 8.1 | 6.1 | 12.4 | 2.8 | 7.1 | 8.3 | 4.9 | 4.5 | 8.0 | 4.8 | 9.5 | 4.8 |
| 25 | 50 | 25 | 75 | 8.1 | 6.1 | 12.4 | 1.4 | 0.2 | 0.0 | 0.1 | 0.0 | 0.0 | 4.3 | 7.0 | 3.6 |
| 25 | 50 | 50 | 25 | 8.1 | 6.1 | 12.4 | 6.9 | 5.8 | 7.7 | 5.9 | 5.9 | 6.0 | 5.3 | 10.5 | 7.8 |
| 25 | 50 | 50 | 50 | 8.1 | 6.1 | 12.4 | 4.1 | 5.5 | 7.2 | 2.3 | 3.4 | 6.5 | 5.2 | 9.8 | 6.3 |
| 25 | 50 | 50 | 75 | 8.1 | 6.1 | 12.4 | 0.2 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 | 4.7 | 6.9 | 4.1 |
| 25 | 50 | 75 | 25 | 8.1 | 6.1 | 12.4 | 0.4 | 1.6 | 1.2 | 1.4 | 0.2 | 1.7 | 6.0 | 9.6 | 10.6 |
| 25 | 50 | 75 | 50 | 8.1 | 6.1 | 12.4 | 0.5 | 1.8 | 0.7 | 1.4 | 0.3 | 0.9 | 5.9 | 9.0 | 9.2 |
| 25 | 50 | 75 | 75 | 8.1 | 6.1 | 12.4 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 | 0.1 | 5.0 | 6.7 | 6.7 |
| 25 | 75 | 25 | 25 | 6.0 | 5.0 | 10.4 | 9.6 | 13.5 | 14.0 | 9.5 | 11.0 | 14.2 | 4.7 | 8.0 | 16.0 |
| 25 | 75 | 25 | 50 | 6.0 | 5.0 | 10.4 | 9.6 | 13.5 | 14.1 | 9.6 | 11.0 | 14.2 | 3.8 | 7.6 | 14.9 |
| 25 | 75 | 25 | 75 | 6.0 | 5.0 | 10.4 | 5.7 | 10.1 | 5.7 | 4.6 | 8.6 | 6.9 | 3.7 | 6.3 | 9.5 |
| 25 | 75 | 50 | 25 | 6.0 | 5.0 | 10.4 | 9.6 | 13.5 | 14.1 | 9.5 | 10.9 | 13.8 | 4.5 | 7.1 | 16.5 |
| 25 | 75 | 50 | 50 | 6.0 | 5.0 | 10.4 | 9.6 | 13.5 | 14.3 | 9.6 | 10.9 | 13.8 | 4.0 | 6.9 | 15.4 |
| 25 | 75 | 50 | 75 | 6.0 | 5.0 | 10.4 | 5.7 | 9.6 | 5.7 | 4.6 | 8.1 | 6.4 | 3.5 | 5.8 | 9.7 |
| 25 | 75 | 75 | 25 | 6.0 | 5.0 | 10.4 | 8.8 | 12.5 | 13.1 | 8.1 | 10.4 | 12.2 | 4.6 | 7.2 | 18.4 |
| 25 | 75 | 75 | 50 | 6.0 | 5.0 | 10.4 | 8.8 | 12.5 | 13.1 | 7.8 | 10.4 | 12.2 | 4.3 | 6.1 | 17.9 |
| 25 | 75 | 75 | 75 | 6.0 | 5.0 | 10.4 | 4.3 | 8.9 | 4.3 | 2.7 | 5.8 | 4.3 | 3.6 | 3.6 | 13.2 |
| 50 | 25 | 25 | 25 | 8.9 | 6.5 | 13.2 | 0.0 | 0.4 | 0.1 | 0.1 | 0.3 | 0.2 | 6.6 | 10.0 | 4.9 |
| 50 | 25 | 25 | 50 | 8.9 | 6.5 | 13.2 | 0.1 | 0.0 | 0.1 | 0.1 | 0.1 | 0.1 | 6.4 | 9.6 | 4.0 |
| 50 | 25 | 25 | 75 | 8.9 | 6.5 | 13.2 | 0.1 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 | 5.3 | 7.9 | 4.4 |
| 50 | 25 | 50 | 25 | 8.9 | 6.5 | 13.2 | 0.0 | 0.0 | 0.0 | 0.2 | 0.0 | 0.1 | 6.8 | 10.4 | 6.1 |
| 50 | 25 | 50 | 50 | 8.9 | 6.5 | 13.2 | 0.0 | 0.1 | 0.1 | 0.0 | 0.1 | 0.0 | 6.7 | 10.0 | 5.4 |
| 50 | 25 | 50 | 75 | 8.9 | 6.5 | 13.2 | 0.2 | 0.0 | 0.0 | 0.1 | 0.0 | 0.0 | 5.8 | 7.9 | 5.0 |
| 50 | 25 | 75 | 25 | 8.9 | 6.5 | 13.2 | 0.0 | 0.1 | 0.0 | 0.2 | 0.0 | 0.1 | 7.0 | 9.8 | 8.6 |
| 50 | 25 | 75 | 50 | 8.9 | 6.5 | 13.2 | 0.0 | 0.1 | 0.0 | 0.1 | 0.0 | 0.1 | 7.1 | 9.5 | 7.8 |
| 50 | 25 | 75 | 75 | 8.9 | 6.5 | 13.2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.0 | 8.2 | 6.7 |
| 50 | 50 | 25 | 25 | 8.1 | 6.1 | 12.4 | 8.0 | 8.6 | 9.7 | 7.6 | 6.8 | 9.9 | 5.3 | 10.3 | 6.6 |
| 50 | 50 | 25 | 50 | 8.1 | 6.1 | 12.4 | 3.9 | 7.1 | 7.2 | 4.9 | 4.3 | 7.8 | 5.1 | 9.5 | 5.2 |
| 50 | 50 | 25 | 75 | 8.1 | 6.1 | 12.4 | 1.4 | 0.4 | 0.0 | 0.1 | 0.0 | 0.0 | 4.6 | 7.2 | 4.0 |
| 50 | 50 | 50 | 25 | 8.1 | 6.1 | 12.4 | 6.9 | 5.5 | 7.4 | 5.9 | 6.3 | 7.8 | 5.6 | 10.4 | 8.0 |
| 50 | 50 | 50 | 50 | 8.1 | 6.1 | 12.4 | 4.3 | 4.9 | 7.8 | 2.1 | 3.7 | 7.4 | 5.5 | 9.8 | 6.6 |
| 50 | 50 | 50 | 75 | 8.1 | 6.1 | 12.4 | 0.2 | 0.4 | 0.0 | 0.1 | 0.0 | 0.0 | 4.8 | 7.2 | 4.2 |
| 50 | 50 | 75 | 25 | 8.1 | 6.1 | 12.4 | 0.5 | 1.6 | 0.8 | 0.7 | 0.1 | 1.0 | 6.4 | 9.6 | 10.2 |
| 50 | 50 | 75 | 50 | 8.1 | 6.1 | 12.4 | 0.5 | 1.8 | 0.7 | 0.6 | 0.3 | 0.7 | 6.3 | 9.1 | 9.1 |
| 50 | 50 | 75 | 75 | 8.1 | 6.1 | 12.4 | 0.1 | 0.0 | 0.1 | 0.0 | 0.0 | 0.1 | 5.3 | 7.1 | 7.2 |
| 50 | 75 | 25 | 25 | 6.0 | 5.0 | 10.4 | 9.6 | 13.5 | 14.0 | 9.5 | 11.0 | 14.2 | 4.7 | 7.9 | 15.9 |
| 50 | 75 | 25 | 50 | 6.0 | 5.0 | 10.4 | 9.6 | 13.5 | 14.1 | 9.6 | 11.0 | 14.2 | 3.9 | 7.6 | 13.5 |
| 50 | 75 | 25 | 75 | 6.0 | 5.0 | 10.4 | 5.7 | 10.1 | 6.4 | 4.6 | 8.7 | 7.6 | 3.4 | 6.4 | 9.1 |
| 50 | 75 | 50 | 25 | 6.0 | 5.0 | 10.4 | 9.6 | 13.5 | 14.0 | 9.5 | 10.9 | 14.1 | 4.6 | 7.7 | 16.7 |
| 50 | 75 | 50 | 50 | 6.0 | 5.0 | 10.4 | 9.6 | 13.5 | 14.3 | 9.6 | 10.9 | 13.8 | 4.1 | 6.9 | 15.7 |
| 50 | 75 | 50 | 75 | 6.0 | 5.0 | 10.4 | 5.7 | 10.1 | 6.4 | 4.6 | 8.7 | 6.4 | 3.6 | 6.0 | 9.2 |
| 50 | 75 | 75 | 25 | 6.0 | 5.0 | 10.4 | 8.8 | 12.5 | 13.1 | 8.1 | 10.1 | 12.2 | 4.8 | 7.5 | 18.4 |
| 50 | 75 | 75 | 50 | 6.0 | 5.0 | 10.4 | 8.8 | 12.5 | 13.1 | 8.2 | 10.1 | 12.2 | 4.3 | 6.3 | 17.6 |
| 50 | 75 | 75 | 75 | 6.0 | 5.0 | 10.4 | 4.3 | 8.9 | 4.3 | 2.9 | 5.8 | 4.3 | 3.4 | 4.4 | 13.2 |
| 75 | 25 | 25 | 25 | 8.9 | 6.5 | 13.2 | 0.0 | 0.0 | 0.3 | 0.1 | 0.0 | 0.3 | 6.9 | 9.1 | 6.9 |
| 75 | 25 | 25 | 50 | 8.9 | 6.5 | 13.2 | 0.2 | 0.2 | 0.0 | 0.5 | 0.1 | 0.1 | 6.6 | 9.0 | 6.6 |
| 75 | 25 | 25 | 75 | 8.9 | 6.5 | 13.2 | 0.1 | 0.1 | 0.4 | 0.0 | 0.0 | 0.0 | 6.0 | 7.9 | 5.7 |
| 75 | 25 | 50 | 25 | 8.9 | 6.5 | 13.2 | 0.0 | 0.0 | 0.3 | 0.1 | 0.1 | 0.1 | 7.1 | 9.3 | 7.8 |
| 75 | 25 | 50 | 50 | 8.9 | 6.5 | 13.2 | 0.2 | 0.1 | 0.3 | 0.3 | 0.1 | 0.0 | 7.0 | 9.0 | 7.5 |
| 75 | 25 | 50 | 75 | 8.9 | 6.5 | 13.2 | 0.1 | 0.1 | 0.1 | 0.0 | 0.0 | 0.0 | 6.2 | 8.0 | 5.8 |
| 75 | 25 | 75 | 25 | 8.9 | 6.5 | 13.2 | 0.1 | 0.2 | 0.3 | 0.1 | 0.1 | 0.2 | 7.3 | 8.6 | 9.5 |
| 75 | 25 | 75 | 50 | 8.9 | 6.5 | 13.2 | 0.1 | 0.2 | 0.3 | 0.2 | 0.1 | 0.2 | 7.1 | 8.6 | 8.8 |
| 75 | 25 | 75 | 75 | 8.9 | 6.5 | 13.2 | 0.0 | 0.1 | 0.2 | 0.0 | 0.0 | 0.2 | 6.3 | 8.1 | 8.1 |
| 75 | 50 | 25 | 25 | 8.1 | 6.1 | 12.4 | 3.1 | 2.3 | 2.9 | 2.6 | 1.2 | 4.0 | 6.0 | 9.6 | 8.4 |
| 75 | 50 | 25 | 50 | 8.1 | 6.1 | 12.4 | 3.0 | 0.5 | 3.3 | 1.7 | 0.9 | 1.8 | 5.9 | 9.5 | 7.4 |
| 75 | 50 | 25 | 75 | 8.1 | 6.1 | 12.4 | 0.1 | 0.1 | 0.8 | 0.0 | 0.2 | 0.0 | 5.3 | 7.9 | 5.6 |
| 75 | 50 | 50 | 25 | 8.1 | 6.1 | 12.4 | 2.6 | 2.6 | 0.4 | 1.5 | 2.4 | 0.5 | 6.2 | 9.9 | 9.2 |
| 75 | 50 | 50 | 50 | 8.1 | 6.1 | 12.4 | 2.1 | 0.1 | 0.3 | 0.8 | 0.1 | 0.5 | 6.2 | 9.5 | 8.6 |
| 75 | 50 | 50 | 75 | 8.1 | 6.1 | 12.4 | 0.1 | 0.1 | 0.7 | 0.0 | 0.2 | 0.0 | 5.5 | 7.7 | 6.4 |
| 75 | 50 | 75 | 25 | 8.1 | 6.1 | 12.4 | 0.4 | 0.2 | 0.5 | 0.2 | 0.1 | 0.8 | 6.8 | 8.9 | 10.8 |
| 75 | 50 | 75 | 50 | 8.1 | 6.1 | 12.4 | 0.4 | 0.2 | 0.4 | 0.2 | 0.1 | 0.8 | 6.7 | 8.8 | 10.0 |
| 75 | 50 | 75 | 75 | 8.1 | 6.1 | 12.4 | 0.1 | 0.1 | 0.3 | 0.0 | 0.0 | 0.3 | 5.7 | 7.8 | 7.7 |
| 75 | 75 | 25 | 25 | 6.0 | 5.0 | 10.4 | 9.6 | 13.1 | 14.4 | 9.6 | 11.0 | 12.6 | 5.5 | 8.6 | 14.8 |
| 75 | 75 | 25 | 50 | 6.0 | 5.0 | 10.4 | 9.6 | 13.0 | 14.4 | 9.6 | 11.0 | 12.6 | 4.8 | 8.3 | 14.2 |
| 75 | 75 | 25 | 75 | 6.0 | 5.0 | 10.4 | 4.1 | 9.0 | 7.4 | 3.6 | 5.6 | 6.6 | 3.9 | 6.9 | 8.4 |
| 75 | 75 | 50 | 25 | 6.0 | 5.0 | 10.4 | 9.6 | 13.1 | 14.4 | 9.6 | 11.1 | 12.5 | 5.6 | 8.5 | 15.5 |
| 75 | 75 | 50 | 50 | 6.0 | 5.0 | 10.4 | 9.6 | 13.0 | 14.1 | 9.6 | 11.1 | 12.5 | 4.9 | 8.1 | 15.4 |
| 75 | 75 | 50 | 75 | 6.0 | 5.0 | 10.4 | 4.1 | 7.6 | 7.5 | 3.6 | 5.6 | 6.9 | 3.7 | 6.6 | 9.4 |
| 75 | 75 | 75 | 25 | 6.0 | 5.0 | 10.4 | 7.1 | 8.2 | 9.5 | 7.9 | 6.8 | 10.7 | 6.5 | 8.3 | 18.1 |
| 75 | 75 | 75 | 50 | 6.0 | 5.0 | 10.4 | 7.1 | 8.2 | 9.5 | 7.9 | 6.8 | 10.7 | 5.6 | 7.7 | 17.8 |
| 75 | 75 | 75 | 75 | 6.0 | 5.0 | 10.4 | 4.5 | 7.7 | 7.5 | 3.1 | 5.3 | 6.4 | 4.0 | 5.3 | 12.8 |
| Note: The values displayed are in $k | |||||||||||||||
| Percentile | Quadratic Regression | CNLS-d (median) | CNLS-d (equal) | LL Kernel | |||||||||||
| MinDiag | MinTher | MajDiag | MajTher | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 | 2007 | 2008 | 2009 |
| 25 | 25 | 25 | 25 | 10.5 | 11.5 | 9.8 | 1.7 | 2.8 | 4.4 | 3.3 | 1.8 | 4.9 | 18.4 | 14.3 | 22.9 |
| 25 | 25 | 25 | 50 | 11.7 | 13.0 | 10.8 | 17.3 | 17.0 | 20.0 | 16.8 | 15.2 | 18.5 | 17.5 | 11.2 | 18.5 |
| 25 | 25 | 25 | 75 | 15.1 | 17.2 | 14.5 | 19.4 | 22.3 | 24.6 | 19.8 | 21.8 | 24.0 | 15.2 | 10.2 | 12.7 |
| 25 | 25 | 50 | 25 | 10.5 | 11.5 | 9.8 | 0.0 | 0.0 | 0.2 | 0.1 | 0.0 | 0.1 | 17.4 | 14.6 | 21.7 |
| 25 | 25 | 50 | 50 | 11.7 | 13.0 | 10.8 | 9.6 | 10.6 | 13.5 | 11.2 | 10.3 | 13.4 | 16.8 | 12.2 | 18.1 |
| 25 | 25 | 50 | 75 | 15.1 | 17.2 | 14.5 | 19.8 | 22.2 | 24.6 | 19.8 | 21.8 | 24.0 | 15.8 | 10.5 | 13.9 |
| 25 | 25 | 75 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.9 | 0.1 | 0.1 | 0.0 | 0.2 | 17.4 | 14.9 | 17.2 |
| 25 | 25 | 75 | 50 | 11.7 | 13.0 | 10.8 | 1.3 | 1.7 | 1.3 | 2.9 | 0.1 | 5.1 | 17.3 | 14.0 | 17.2 |
| 25 | 25 | 75 | 75 | 15.1 | 17.2 | 14.5 | 16.1 | 18.0 | 23.5 | 16.8 | 16.5 | 23.8 | 16.9 | 11.8 | 14.2 |
| 25 | 50 | 25 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.1 | 0.2 | 0.1 | 0.1 | 0.5 | 17.9 | 12.6 | 20.0 |
| 25 | 50 | 25 | 50 | 11.7 | 13.0 | 10.8 | 12.9 | 7.9 | 8.2 | 10.0 | 7.0 | 6.2 | 17.1 | 9.7 | 16.8 |
| 25 | 50 | 25 | 75 | 15.1 | 17.2 | 14.5 | 19.8 | 22.3 | 24.6 | 19.8 | 21.8 | 24.0 | 15.0 | 8.7 | 12.1 |
| 25 | 50 | 50 | 25 | 10.5 | 11.5 | 9.8 | 0.4 | 0.2 | 0.4 | 0.1 | 0.5 | 0.3 | 17.3 | 13.3 | 18.9 |
| 25 | 50 | 50 | 50 | 11.7 | 13.0 | 10.8 | 5.2 | 5.2 | 1.4 | 10.5 | 8.1 | 5.8 | 16.6 | 10.8 | 16.5 |
| 25 | 50 | 50 | 75 | 15.1 | 17.2 | 14.5 | 19.8 | 22.2 | 24.6 | 19.8 | 21.8 | 24.0 | 15.4 | 8.5 | 12.6 |
| 25 | 50 | 75 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.3 | 0.1 | 0.2 | 0.1 | 0.2 | 17.3 | 14.1 | 15.7 |
| 25 | 50 | 75 | 50 | 11.7 | 13.0 | 10.8 | 0.1 | 0.5 | 0.9 | 0.2 | 0.2 | 4.6 | 16.9 | 13.0 | 16.3 |
| 25 | 50 | 75 | 75 | 15.1 | 17.2 | 14.5 | 16.1 | 18.0 | 22.8 | 16.8 | 16.5 | 23.8 | 15.9 | 10.2 | 14.3 |
| 25 | 75 | 25 | 25 | 10.5 | 11.5 | 9.8 | 0.0 | 0.0 | 0.1 | 0.2 | 0.1 | 0.1 | 17.1 | 9.3 | 9.9 |
| 25 | 75 | 25 | 50 | 11.7 | 13.0 | 10.8 | 1.6 | 0.0 | 0.3 | 0.7 | 0.1 | 0.1 | 15.4 | 7.0 | 9.8 |
| 25 | 75 | 25 | 75 | 15.1 | 17.2 | 14.5 | 18.3 | 12.4 | 20.9 | 16.2 | 10.9 | 14.3 | 15.3 | 6.2 | 6.7 |
| 25 | 75 | 50 | 25 | 10.5 | 11.5 | 9.8 | 0.2 | 0.0 | 0.2 | 0.0 | 0.1 | 0.2 | 16.3 | 9.2 | 9.4 |
| 25 | 75 | 50 | 50 | 11.7 | 13.0 | 10.8 | 0.2 | 0.3 | 0.4 | 0.8 | 0.1 | 0.3 | 15.6 | 6.6 | 8.4 |
| 25 | 75 | 50 | 75 | 15.1 | 17.2 | 14.5 | 18.3 | 12.8 | 20.9 | 16.2 | 11.3 | 15.2 | 15.2 | 6.2 | 5.8 |
| 25 | 75 | 75 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.1 | 0.1 | 0.2 | 0.1 | 0.1 | 15.7 | 11.1 | 9.8 |
| 25 | 75 | 75 | 50 | 11.7 | 13.0 | 10.8 | 0.1 | 0.1 | 0.1 | 0.6 | 0.1 | 0.1 | 15.6 | 8.4 | 9.7 |
| 25 | 75 | 75 | 75 | 15.1 | 17.2 | 14.5 | 15.5 | 10.4 | 19.7 | 17.2 | 10.8 | 16.7 | 14.7 | 6.9 | 8.0 |
| 50 | 25 | 25 | 25 | 10.5 | 11.5 | 9.8 | 0.3 | 0.1 | 0.0 | 0.2 | 0.3 | 2.7 | 18.6 | 14.1 | 21.4 |
| 50 | 25 | 25 | 50 | 11.7 | 13.0 | 10.8 | 17.8 | 17.7 | 17.0 | 16.3 | 15.4 | 19.3 | 18.0 | 11.4 | 17.2 |
| 50 | 25 | 25 | 75 | 15.1 | 17.2 | 14.5 | 19.2 | 22.3 | 24.6 | 19.8 | 21.8 | 24.0 | 15.1 | 11.1 | 13.1 |
| 50 | 25 | 50 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.0 | 0.1 | 0.2 | 0.0 | 0.1 | 17.6 | 14.7 | 20.4 |
| 50 | 25 | 50 | 50 | 11.7 | 13.0 | 10.8 | 11.3 | 11.8 | 15.7 | 10.5 | 10.3 | 14.6 | 17.2 | 12.5 | 17.2 |
| 50 | 25 | 50 | 75 | 15.1 | 17.2 | 14.5 | 19.8 | 22.1 | 24.6 | 19.8 | 21.8 | 24.0 | 15.9 | 10.7 | 13.5 |
| 50 | 25 | 75 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 1.0 | 0.3 | 0.1 | 0.0 | 1.3 | 17.3 | 15.1 | 16.8 |
| 50 | 25 | 75 | 50 | 11.7 | 13.0 | 10.8 | 0.9 | 1.5 | 2.1 | 0.5 | 0.2 | 1.3 | 17.2 | 14.5 | 16.9 |
| 50 | 25 | 75 | 75 | 15.1 | 17.2 | 14.5 | 16.1 | 18.0 | 23.5 | 16.8 | 16.5 | 23.6 | 16.7 | 13.0 | 14.3 |
| 50 | 50 | 25 | 25 | 10.5 | 11.5 | 9.8 | 0.2 | 0.1 | 0.2 | 0.1 | 0.4 | 0.5 | 18.2 | 12.8 | 18.6 |
| 50 | 50 | 25 | 50 | 11.7 | 13.0 | 10.8 | 11.1 | 7.9 | 10.0 | 9.4 | 7.7 | 5.5 | 17.6 | 10.3 | 15.8 |
| 50 | 50 | 25 | 75 | 15.1 | 17.2 | 14.5 | 19.8 | 22.2 | 24.6 | 19.8 | 21.8 | 24.0 | 14.9 | 9.4 | 12.1 |
| 50 | 50 | 50 | 25 | 10.5 | 11.5 | 9.8 | 0.4 | 0.2 | 0.5 | 0.1 | 0.1 | 0.4 | 17.5 | 13.6 | 17.6 |
| 50 | 50 | 50 | 50 | 11.7 | 13.0 | 10.8 | 3.7 | 7.7 | 1.7 | 6.9 | 7.1 | 3.7 | 16.9 | 11.5 | 15.6 |
| 50 | 50 | 50 | 75 | 15.1 | 17.2 | 14.5 | 19.8 | 22.0 | 24.6 | 19.8 | 21.8 | 24.0 | 15.2 | 9.5 | 12.8 |
| 50 | 50 | 75 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.3 | 0.2 | 0.0 | 0.0 | 0.4 | 17.4 | 14.6 | 14.8 |
| 50 | 50 | 75 | 50 | 11.7 | 13.0 | 10.8 | 0.1 | 0.5 | 0.3 | 0.1 | 0.2 | 1.0 | 17.0 | 13.6 | 15.3 |
| 50 | 50 | 75 | 75 | 15.1 | 17.2 | 14.5 | 17.4 | 18.0 | 22.8 | 16.8 | 16.5 | 23.8 | 16.0 | 11.5 | 14.8 |
| 50 | 75 | 25 | 25 | 10.5 | 11.5 | 9.8 | 0.0 | 0.0 | 0.1 | 0.2 | 0.1 | 0.0 | 17.0 | 9.3 | 9.5 |
| 50 | 75 | 25 | 50 | 11.7 | 13.0 | 10.8 | 1.6 | 0.0 | 0.3 | 0.7 | 0.1 | 0.1 | 15.7 | 7.6 | 6.3 |
| 50 | 75 | 25 | 75 | 15.1 | 17.2 | 14.5 | 18.3 | 12.4 | 19.8 | 16.2 | 11.0 | 13.4 | 14.4 | 6.6 | 6.5 |
| 50 | 75 | 50 | 25 | 10.5 | 11.5 | 9.8 | 0.2 | 0.0 | 0.1 | 0.0 | 0.1 | 0.1 | 16.6 | 10.0 | 8.9 |
| 50 | 75 | 50 | 50 | 11.7 | 13.0 | 10.8 | 0.2 | 0.2 | 0.4 | 0.8 | 0.1 | 0.3 | 15.8 | 8.2 | 8.1 |
| 50 | 75 | 50 | 75 | 15.1 | 17.2 | 14.5 | 18.3 | 12.4 | 19.8 | 16.2 | 11.0 | 15.2 | 14.8 | 6.7 | 5.3 |
| 50 | 75 | 75 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.1 | 0.1 | 0.2 | 0.1 | 0.1 | 16.1 | 11.6 | 9.6 |
| 50 | 75 | 75 | 50 | 11.7 | 13.0 | 10.8 | 0.1 | 0.1 | 0.1 | 0.3 | 0.1 | 0.1 | 15.5 | 9.2 | 9.1 |
| 50 | 75 | 75 | 75 | 15.1 | 17.2 | 14.5 | 15.5 | 10.4 | 19.7 | 16.3 | 10.8 | 16.7 | 14.6 | 8.3 | 7.2 |
| 75 | 25 | 25 | 25 | 10.5 | 11.5 | 9.8 | 0.3 | 0.0 | 0.3 | 0.3 | 0.0 | 1.5 | 18.9 | 14.7 | 15.0 |
| 75 | 25 | 25 | 50 | 11.7 | 13.0 | 10.8 | 2.4 | 9.7 | 4.0 | 6.7 | 6.2 | 4.3 | 18.0 | 13.9 | 13.9 |
| 75 | 25 | 25 | 75 | 15.1 | 17.2 | 14.5 | 19.6 | 19.5 | 24.7 | 19.3 | 18.3 | 24.4 | 15.7 | 13.2 | 11.0 |
| 75 | 25 | 50 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.0 | 0.2 | 0.3 | 0.0 | 0.2 | 18.0 | 15.4 | 14.8 |
| 75 | 25 | 50 | 50 | 11.7 | 13.0 | 10.8 | 3.9 | 5.5 | 0.8 | 4.5 | 2.8 | 3.3 | 17.6 | 14.7 | 13.8 |
| 75 | 25 | 50 | 75 | 15.1 | 17.2 | 14.5 | 19.6 | 19.5 | 24.7 | 19.3 | 18.9 | 24.4 | 16.4 | 13.4 | 11.9 |
| 75 | 25 | 75 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.1 | 0.3 | 0.0 | 0.1 | 0.4 | 17.1 | 16.1 | 14.0 |
| 75 | 25 | 75 | 50 | 11.7 | 13.0 | 10.8 | 0.1 | 0.1 | 0.3 | 0.2 | 0.1 | 0.4 | 17.1 | 16.5 | 14.3 |
| 75 | 25 | 75 | 75 | 15.1 | 17.2 | 14.5 | 19.5 | 11.6 | 23.4 | 19.1 | 18.5 | 20.8 | 17.5 | 15.4 | 16.5 |
| 75 | 50 | 25 | 25 | 10.5 | 11.5 | 9.8 | 0.3 | 0.1 | 0.5 | 1.7 | 0.1 | 0.7 | 18.6 | 13.9 | 13.2 |
| 75 | 50 | 25 | 50 | 11.7 | 13.0 | 10.8 | 0.9 | 7.4 | 1.4 | 3.1 | 4.5 | 3.5 | 17.8 | 13.3 | 12.5 |
| 75 | 50 | 25 | 75 | 15.1 | 17.2 | 14.5 | 19.6 | 19.5 | 24.7 | 19.3 | 19.2 | 24.4 | 15.0 | 12.4 | 10.9 |
| 75 | 50 | 50 | 25 | 10.5 | 11.5 | 9.8 | 0.5 | 0.1 | 0.2 | 0.7 | 0.1 | 0.1 | 17.8 | 15.0 | 12.8 |
| 75 | 50 | 50 | 50 | 11.7 | 13.0 | 10.8 | 0.7 | 5.5 | 0.8 | 2.5 | 2.8 | 3.3 | 17.3 | 14.2 | 12.2 |
| 75 | 50 | 50 | 75 | 15.1 | 17.2 | 14.5 | 19.6 | 19.5 | 24.7 | 19.3 | 19.8 | 24.4 | 15.7 | 12.3 | 12.3 |
| 75 | 50 | 75 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.1 | 0.2 | 0.2 | 0.1 | 0.2 | 17.3 | 16.0 | 12.1 |
| 75 | 50 | 75 | 50 | 11.7 | 13.0 | 10.8 | 0.1 | 0.1 | 0.3 | 0.2 | 0.1 | 0.2 | 17.0 | 16.1 | 12.9 |
| 75 | 50 | 75 | 75 | 15.1 | 17.2 | 14.5 | 19.2 | 11.6 | 24.2 | 19.1 | 18.5 | 20.0 | 16.5 | 15.0 | 15.5 |
| 75 | 75 | 25 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.2 | 0.1 | 0.1 | 0.2 | 0.3 | 16.8 | 11.6 | 6.9 |
| 75 | 75 | 25 | 50 | 11.7 | 13.0 | 10.8 | 0.1 | 0.4 | 0.1 | 0.1 | 0.2 | 0.3 | 16.2 | 10.4 | 6.5 |
| 75 | 75 | 25 | 75 | 15.1 | 17.2 | 14.5 | 18.6 | 12.6 | 15.4 | 15.9 | 15.0 | 14.1 | 14.6 | 10.1 | 4.5 |
| 75 | 75 | 50 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.2 | 0.1 | 0.1 | 0.1 | 0.1 | 16.5 | 12.4 | 7.2 |
| 75 | 75 | 50 | 50 | 11.7 | 13.0 | 10.8 | 0.1 | 0.4 | 0.1 | 0.1 | 0.1 | 0.1 | 15.8 | 11.5 | 7.2 |
| 75 | 75 | 50 | 75 | 15.1 | 17.2 | 14.5 | 18.6 | 13.4 | 15.9 | 15.9 | 15.0 | 13.6 | 14.4 | 10.6 | 5.3 |
| 75 | 75 | 75 | 25 | 10.5 | 11.5 | 9.8 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.2 | 15.7 | 14.1 | 8.5 |
| 75 | 75 | 75 | 50 | 11.7 | 13.0 | 10.8 | 0.1 | 0.2 | 0.1 | 0.1 | 0.1 | 0.2 | 15.3 | 13.1 | 7.4 |
| 75 | 75 | 75 | 75 | 15.1 | 17.2 | 14.5 | 13.5 | 7.2 | 14.2 | 12.2 | 11.7 | 12.1 | 14.6 | 12.7 | 7.9 |
| Note: The values displayed are in $k | |||||||||||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Direction Selection in Stochastic Directional Distance Functions
Kevin Layer
Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, USA.
Andrew L. Johnson
School of Information Science and Technology, Osaka University, Suita, Japan.
Robin C. Sickles
Department of Economics, Rice University, Houston, TX, USA.
Gary D. Ferrier
Department of Economics, University of Arkansas, Fayetteville, AR, USA.
Abstract
Researchers rely on the distance function to model multiple product production using multiple inputs. A stochastic directional distance function (SDDF) allows for noise in potentially all input and output variables. Yet, when estimated, the direction selected will affect the functional estimates because deviations from the estimated function are minimized in the specified direction. Specifically, the parameters of the parametric SDDF are point identified when the direction is specified; we show that the parameters of the parametric SDDF are set identified when multiple directions are considered. Further, the set of identified parameters can be narrowed via data-driven approaches to restrict the directions considered. We demonstrate a similar narrowing of the identified parameter set for a shape constrained nonparametric method, where the shape constraints impose standard features of a cost function such as monotonicity and convexity.
Our Monte Carlo simulation studies reveal significant improvements, as measured by out of sample radial mean squared error, in functional estimates when we use a directional distance function with an appropriately selected direction and the errors are uncorrelated across variables. We show that these benefits increase as the correlation in error terms across variables increase. This correlation is a type of endogeneity that is common in production settings. From our Monte Carlo simulations we conclude that selecting a direction that is approximately orthogonal to the estimated function in the central region of the data gives significantly better estimates relative to the directions commonly used in the literature. For practitioners, our results imply that selecting a direction vector that has non-zero components for all variables that may have measurement error provides a significant improvement in the estimator’s performance. We illustrate these results using cost and production data from samples of approximately 500 US hospitals per year operating in 2007, 2008, and 2009, respectively, and find that the shape constrained nonparametric methods provide a significant increase in flexibility over second order local approximation parametric methods.
keywords:
Nonparametric regression , Shape Constraints , Data Envelopment Analysis , Hospital production.
††journal: European Journal of Operational Research
1 Introduction
The focus of this paper is direction selection in stochastic directional distance functions (SDDF).111Here we use the term stochastic in reference to a model with a noise term. While the DDF is typically used to measure efficiency, in this paper we use a nonparametric shape constrained SDDF to model the conditional mean behavior of production. The stochastic distance function (SDF) was introduced by Lovell et al. (1994) and was used in a series of early empirical studies by Coelli and Perelman (1999, 2000) and Sickles et al. (2002). The parameters of a parametric distance function are point identified; however, if the direction in the DDF is not specified, then the parameters of a parametric DDF are set identified.222Let be what is known (e.g., via assumptions and restrictions) about the data generating process (DGP). Let represent the parameters to be identified, let denote all possible values of , and let be the true but unknown value of . Then the vector of unknown parameters is point identified if it is uniquely determined from . However, is set identified if some of the possible values of are observationally equivalent to (Lewbel (forthcoming)). A set of axiomatic properties related to production and cost functions, such as monotonicity and convexity in the case of a cost function, are well established in the production literature (Shephard (1970), Chambers (1988)). Although the stochastic distance function literature acknowledges the axiomatic properties necessary for duality, it does not impose them globally. Instead, authors typically impose them only on a particular point in the data (e.g., Atkinson et al. (2003)). Recognizing these issues, we provide an axiomatic nonparametric estimator of the SDDF and a method to restrict the pool of the directions to choose from for the SDDF, thereby reducing the size of the set identified parameter set. Most empirical studies that use establishment or hospital level data to estimate production or cost functions either assume a specific parametric form or ignore noise, or both ((Hollingsworth, 2003)). In contrast, we use an axiomatic nonparametric SDDF estimator and the proposed method to determine a set of acceptable directions to estimate a cost function that maintains global axiomatic properties for the US hospital industry. Furthermore, we demonstrate the importance of global axiomatic properties for the estimation of most productive scale size and marginal costs.
A few papers have attempted to implement the directional distance function in a stochastic setting (see, for example, Färe et al. (2005), Färe et al. (2010), and Färe and Vardanyan (2016)). The latter two papers discuss the challenges of selecting a parametric functional form that does not violate the axioms typically assumed in production economics. Based on their observations, Färe and Vardanyan (2016) use a quadratic functional specification.333As Kuosmanen and Johnson (2017) note, the translog function used for multi-output production cannot satisfy the standard assumptions for the production technology globally for any parameter values. The quadratic functional form does not have this shortcoming. Yet several papers show a loss of flexibility in parametric functional forms, such as the translog or the quadratic functional form, when shape constraints are imposed (e.g., Diewert and Wales (1987)). Also important to implementation, the selection of the direction vector in the SDDF has been discussed in Färe et al. (2017) and Atkinson and Tsionas (2016), among others. These papers focus on selecting the direction corresponding to a particular interpretation of the inefficiency measure, based on the distance to the economically efficient point. In contrast, we consider Kuosmanen and Johnson (2017)’s multi-step efficiency analysis and focus on the first step, estimating a conditional mean function. Our goal is to select the direction that best recovers the underlying technology while acknowledging that the data is likely to contain noise in potentially all variables.444For researchers interested in productivity measurement and productivity variation (e.g., Syverson (2011)), the results from this paper can be used directly. For authors interested in efficiency analysis, the insights from this paper could be used to improve the estimates from the first stage of Kuosmanen and Johnson (2017)’s three-step procedure where efficiency is estimated in the third step.
To model multi-product production, Kuosmanen and Johnson (2017) have proposed the use of axiomatic nonparametric methods to estimate the SDDF which they name Directional Convex Nonparametric Least Squares (CNLS-d), a type of sieve estimator. Their methods have the benefits of relaxing standard functional form assumptions for production, cost, or distance functions, but also improve the interpretability and finite sample efficiency over nonparametric methods such as kernel regression (Yagi et al. (2018)). A variety of models can be interpreted as special cases of Kuosmanen and Johnson (2017), among these are a set of models that specify the direction (e.g., Johnson and Kuosmanen (2011), Kuosmanen and Kortelainen (2012)). All CNLS models are sieve estimators and fall into the category of partially identified or set identified estimators discussed in Manski (2003) and Tamer (2010). The guidance our paper provides in selecting a direction will reduce the size of the set identified for CNLS-d and other DDF estimators with flexible direction specifications.
Much of the production function literature concerns endogeneity issues, for example see Olley and Pakes (1996), Levinsohn and Petrin (2003), and Ackerberg et al. (2015). These methods are often referred to as proxy variable approaches. The argument for endogeneity is typically that decisions regarding variable inputs such as labor are made with some knowledge of the factors included in the unobserved residuals. Recently, these methods have been reinterpreted as instrumental variable approaches (Wooldridge (2009)), or control function approaches (Ackerberg et al. (2015)). Unfortunately, the assumptions on the particular timing of input decisions is not innocuous. Indeed every firm must adjust its inputs in exactly the same way, otherwise the moment restrictions needed for point identification are violated. For an alternative in the stochastic frontier setting, see Kutlu (2018).
Kuosmanen and Johnson (2017) have shown that a production function estimated using a stochastic distance function under a constant returns-to-scale assumption is robust to endogeneity issues because the normalization by one of the inputs or outputs causes the errors-in-variables to cancel each other. In this paper we consider the more general case of a convex technology that does not necessarily satisfy constant returns-to-scale, and show that when errors across variables are highly correlated, a specific type of endogeneity, the SDDF improves estimation performance significantly over the typical alternative of ignoring the endogeneity.
When considering alternative directions in the DDF, we show that the direction that performs the best is often related to the particular performance measure used. We use an out-of-sample mean squared error (MSE) that is measured radially to address this issue. This measure is motivated by the results of our Monte Carlo simulations and is natural for a function that satisfies monotonicity and convexity, assuring the true function and the estimated function are close in the areas were most data are observed.
We analyze US hospital data and characterize the most productive scale size and marginal costs for the US hospital sector. We demonstrate that out-of-sample MSE is reduced significantly by relaxing parametric functional form restrictions. We also observe the advantage of imposing axioms that allow the estimated function to still be interpretable. Concerning the direction selection, we find, for this data set, that the exact direction selected is not very critical in terms of MSE performance, but some commonly used directions should be avoided.
The remainder of this paper is organized as follows. Section 2 introduces the statistical model and the production model. Section 3 describes the estimators used for the analysis. Section 4 outlines our reasons for the MSE measure we propose. Section 5 highlights the importance of the direction selection through Monte Carlo experiments. Section 6 describes our direction selection method. Section 7 demonstrates the benefits of using non-parametric shape-constrained estimators with an appropriately selected direction for US hospital data. Section 8 concludes.
2 Models
2.1 Statistical Model
We consider a statistical model that allows for measurement error in potentially all of the input and output variables. Let , be a vector of random input variables of length and , , be a vector of random output variables of length , where indexes observations. Let , , be a vector of random error variables of length and , , be a vector of random error variables of length . One way of modeling the errors-in-variable (EIV) is:
[TABLE]
Equation (1) is only identified when multiple measurements exist for the same vector of regressors or when a subsample of observations exists in which the regressors are measured exactly (Carroll et al. (2006)). Carroll et al. (2006) discussed a standard regression setting, not a multi-input/multi-output production process. Thus, repeated measurement requires all but one of the netputs to be identical across at least two observations.555Here we use the term netputs to describe the union of the input and output vectors. Neither of of these conditions is likely to hold for typical production data sets; therefore, we develop an alternative approach to identification.
As our starting point, we use the alternative, but equivalent, representation of the EIV model proposed by Kuosmanen and Johnson (2017):
[TABLE]
Clearly, the representations of Carroll et al. (2006) and Kuosmanen and Johnson (2017) are equivalent if:
[TABLE]
We define the following normalization:
[TABLE]
which implies:
[TABLE]
We refer to as the true noise direction and in the most general case we allow the direction to be observation specific.666When the noise direction is observation specific and random, all inputs and outputs potentially contain noise and therefore are endogeneous variables. If some components of the vector are zero, this implies the associated variables are exogeneous and measured with certainty. See Kuosmanen and Johnson (2017) for more details. The estimation methods to consider noise in potentially all inputs will depend on our assumptions about the production technology, which are discussed in the following subsection.
2.2 Production Model
Researchers use production function models, cost function models, or distance function models to characterize production technologies. Considering a general production process with multiple inputs used to produce multiple outputs, we define the production possibility set as:
[TABLE]
Following Shephard (1970), we adopt the following standard assumptions to assure that represents a production technology:
T is closed; 2. 2.
T is convex; 3. 3.
Free Disposability of inputs and outputs; i.e., if and , then .
For an alternative representation, see, for example, Frisch (1964).
Developing methods to estimate characteristics of the production technology while imposing these standard axioms was a popular and fruitful topic from the early 1950’s until the early 1980’s, generating such classic papers as Koopmans (1951), Shephard (1953, 1970), Afriat (1972), Charnes et al. (1978),777Data Envelopment Analysis is perhaps one of the largest success stories and has become an extremely popular method in the OR toolbox for studying efficiency. and Varian (1984). Unfortunately, these methods are deterministic in the sense that they rely on a strong assumption that the data do not contain any measurement errors, omitted variables, or other sources of random noise. Furthermore, for some research communities linear programs were seen as harder to implement than parametric regression which could be calculated via normal equations. Thus, most econometricians and applied economists have chosen to use parametric models, sacrificing flexibility for ease of estimation and the inclusion of noise in the model.
Here we focus our attention on the distance function because it allows the joint production of multi-outputs using multi-inputs. The production function and cost functions can be seen as special cases of the distance function in which there is either a single output or a single input (cost), respectively. Further, motivated by our discussion of EIV models above, we consider a directional distance function which allows for measurement error in potentially all variables. We try to relax both the parametric and deterministic assumptions common in earlier approaches to modeling multi-output/multi-input technologies. We do this by building on an emerging literature that revisits the axiomatic nonparametric approach incorporating standard statistical structures including noise (Kuosmanen (2008);Kuosmanen and Johnson (2010)).
2.2.1 The Deterministic Directional Distance Function (DDF)
Luenberger (1992) and Chambers et al. (1996, 1998) introduced the directional distance function, defined for a technology T as:
[TABLE]
where and are the observed input and output vectors, such that and are assumed to be observed without noise and fully describe the resources used in production and the goods or services generated from production. is the direction vector in the input space, is the direction vector in the output space, and defines the direction from the point in which the distance function is measured.888We assume ; i.e., at least one of the components of either or is non-zero. is commonly interpreted as a measure of inefficiency by quantifying the number of bundles of size needed to move the observed point to the boundary of the technology in a deterministic setting.
Chambers et al. (1998) explained how the directional distance function characterizes the technology T for a given direction vector ; specifically:
[TABLE]
If T satisfies the assumptions stated in Section 2.2, then the directional distance function has the following properties (see Chambers et al. (1998)):
- (a)
is upper semicontinuous in and (jointly); 2. (b)
; 3. (c)
; 4. (d)
; 5. (e)
If T is convex, then is concave in .
An additional property of the DDF is the translation invariance:
- (f)
.
Several theoretical contributions have been made to extend the deterministic DDF, see for example Färe and Grosskopf (2010), Aparicio et al. (2017), Kapelko and Oude Lansink (2017), and Roshdi et al. (2018). The deterministic DDF has been used in several recent applications, including Baležentis and De Witte (2015), Adler and Volta (2016), and Fukuyama and Matousek (2018).
2.2.2 The Stochastic Directional Distance Function
The properties of the deterministic DDF also apply for the stochastic DDF (Färe et al. (2017)). Here we focus on estimating a stochastic DDF considering a residual which is mean zero.999Two models are possible, 1) a mean zero residual indicating that the residual contains only noise used to pursue a productivity analysis, or 2) a composed residual with both inefficiency and noise. Our direction selection analysis is used in the first step of Kuosmanen and Johnson’s three step procedure in which a conditional mean is estimated. This is represented in Figure 1.
Using the statistical model in Section 2.1 and the functional representation of technology in Section 2.2, we restate Proposition 2 in Kuosmanen and Johnson (2017) as:
Proposition 1**.**
If the observed data are generated according to the statistical model described in Section 2.1, then the value of the DDF in the observed data point is equal to the realization of the random variable with mean zero; specifically
[TABLE]
In the stochastic distance function literature, the translation property, (f) above, is commonly invoked to move an arbitrarily chosen netput variable out of the distance function to the left-hand side of the equation, yielding an equation that looks like a standard regression model; see, for example, Lovell et al. (1994) and Kuosmanen and Johnson (2017). Instead, we write the SDDF with all of the outputs on one side to emphasize that all netputs are treated symmetrically.
Under the assumption of constant returns to scale, normalizing by one of the netputs causes the noise terms to cancel for the regressors, thus eliminating the issue of endogeneity (e.g., Coelli (2000), Kuosmanen and Johnson (2017)). However, since we relax the constant returns to scale assumption, endogeneity can still be an issue.101010If the endogeneity is caused by correlations in the errors across variables, it can be addressed by selecting an appropriate direction for the directional distance function. This is the direction we explore in the Monte Carlo simulation below in Section 4.1.
Färe et al. (2017), among others, have recognized that the selection of the direction vector affects the parameter estimates of the production function. In A.1, for the linear parametric DDF defined below, we prove that alternative directions lead to distinct parameter estimates.
3 Estimation
We now describe the estimation of the DDF under a specific parametric functional form and under nonparametric shape constrained methods.
3.1 Parametric Estimation and the DDF
Consider data composed of observations where the inputs are defined by and the outputs by . The estimator minimizes the squared residuals for a DDF with an arbitrary prespecified direction . For a linear production function, we formulate the estimator as:
[TABLE]
where is the intercept, and are the vectors of the marginal effects of the inputs and outputs, respectively, and the are the residuals.
Equation (9b) enforces the translation property described in Chambers et al. (1998); i.e., scaling the netput vector by in the direction causes the distance function to decrease by . The combination of Equation (9a) and Equation (9b) ensures that the residual is computed along the direction . Intuitively this is because the and are rescaled proportionally to the direction in Equation (9b). For a formal proof, see Kuosmanen and Johnson (2017), Proposition 2.
3.2 The CNLS-d Estimator
Convex Nonparametric Least Squares (CNLS) is a non-parametric estimator that imposes the axiomatic properties, such as monotonicity and concavity, on the production technology. The estimator CNLS-d is the directional distance function generalization of CNLS (Hildreth (1954), Kuosmanen (2008)). While CNLS allows for just a single output, CNLS-d permits multiple outputs. In CNLS the direction along which residuals are computed is specified a priori and is typically measured in terms of the unique output, . This corresponds to the assumption that noise is only present in and that all other variables, , do not contain noise. CNLS-d allows the residual to be measured in an arbitrary prespecified direction. If all components of the direction vector are non-zero, this corresponds to an assumption that noise is present in all inputs.
Using the same input-output data defined in Section 2.1, the CNLS-d estimator is given by:
[TABLE]
where is the vector of the intercept terms, and are the matrices of the marginal effects of the inputs and the outputs, respectively, and is the vector of the residuals (Kuosmanen and Johnson, 2017).
Equation (10a) is similar to (9a) with the notable different that are indexed by indicating each observation has their own hyperplane defined by the triplet . Equation (10b), which corresponds to the Afriat inequalities, imposes concavity. Given Equation (10b), Equation (10c) imposes the monotonicity of the estimated frontier relative to the inputs. Equation (10d) enforces the translation property described in Chambers et al. (1998) and has the same interpretation as Equation (9b). Similar to Equation (10c), the combination of Equation (10b) and Equation (10e) imposes the monotonicity of the DDF relative to the outputs. In Equation (10), we specify the CNLS-d estimator with a single common direction, .111111Alternatively, some researchers may be interested in using observation specific directions or perhaps group specific directions (Daraio and Simar (2016)). In A.3, we derive the conditions under which multiple directions can be used in CNLS-d while still maintaining the axiomatic property of global convexity of the production technology. Consider two groups each with their own direction used in the directional distance function. Essentially, the convexity constraint holds as long as the noise is orthogonal to the difference of the two directions used in the estimation. A simple example of this situation is all the noise being in one dimension and the difference between the two directions for this dimension is zero. However, this condition is restrictive when noise is potentially present in all variables. Thus, specifying multiple directions in CNLS-d while maintaining the axiomatic properties of the estimator, specifically, the convexity of the production possibility set, is still an open research question.
4 Measuring MSE under Alternative Directions
4.1 Illustrative Example
Data Generation Process
For our illustrative example, we use a simple linear cost function and a directional distance linear parametric estimator. We consider two noise generation processes: a random noise direction and a fixed noise direction. Here we discuss the random noise direction case, but direct the reader to B for a discussion of the fixed noise direction case.
For our example we consider a single output cost function where the observations , are created by the Data Generation Process (DGP) outlined in Algorithm 1:
Algorithm 1
Output, , is drawn from the continuous uniform distribution .
Cost is calculated as , where .
The noise terms, , are constructed as follows:
(a)
is calculated as:
(11)
where and are the means of the output and cost without noise, respectively.
(b)
The scalar length of the noise is rescaled by the vector, , in each dimension. These scaling factors are calculated as where are drawn from a continuous uniform distribution .
(c)
, where is a scalar length drawn from the normal distribution, , where is prespecified initial value for the standard deviation and is a normalized direction vector.
The observations with noise are obtained by appending the noise terms to the generated data:
(12)
Figure 3 illustrates the results for two cases of the data generating process; in the first case the direction of the noise is random, while in the second case the direction of the noise is fixed.
Evaluating the Parametric Estimator’s Performance
We use two criteria to assess the performance of the parametric estimator: 1) Mean Squared Error (MSE) comparing the true function to the estimated function, and 2) MSE comparing the estimated function to a testing data set. While we can calculate both metrics for our Monte Carlo simulations, only the second metric can be used with our application data below.
To calculate deviations, we use the MSE direction . For any particular point of the testing set, , we determine the estimates, , defined as the intersection of the estimated function characterized by the coefficients and the line passing through , and direction vector . We evaluate the value of the MSE as:
[TABLE]
To compare the true function to the estimated function, we use the Linear Function Data Generation Process, Algorithm 1, steps 1 and 2, to construct our testing data set . To evaluate the estimated function without knowing the true function the testing set is built using the full Linear Function Data Generation Process.
Figure 4 show the MSE computations.
Additional Information Describing the Simulations
We apply the DGP described above to generate a training set, , and a testing set , in which noise is introduced to the observations in random directions. We set the noise scaling coefficient to and the number of observations to . We run repetitions of the simulation for each experiment on a computer with a processor Intel Core i7 CPU 860 2.80 GHz and 8 GB RAM. We use the quadratic solver on MATLAB 2017a.
For the estimator, we define the direction vector used in the parametric DDF as a function of an angular variable , which allows us to investigate alternative directions. Specifically, the direction vector used in the DDF is . We examine the set of directions corresponding to the angles .
Results: Random Noise Directions
Table 1 and Table 2 show results corresponding to the two performance criteria introduced above and shown in Figure 4, the MSE relative to the true function and the MSE relative to a testing data set, respectively. Table 1 shows that the direction corresponding to the angle , , produces the smallest values of MSE (shown in bold in the table) regardless of the direction used for the MSE computation. However, the estimator’s quality diminishes if we select the extreme directions corresponding to the angles [math] and . Table 2 reports performance via a testing set, the direction corresponding to the smallest MSE value (shown in bold) is always the one matching the direction used in the MSE computation. In applications, using a testing set is necessary because the true function is unknown. Table 2 shows the benefits of matching the direction of MSE evaluation direction outweigh the benefits of selecting a direction based on the properties of the function being estimated.
For the out-of-sample testing set, the direction that provides the smallest MSE value is the direction used for the MSE computation. Because the functional estimate is optimized for the direction specified in the SDDF, it is perhaps expected that using the same direction that will be used in the MSE evaluation would produce a relatively low MSE compared to other directions. However, when the functional estimate is compared to the true function, the MSE values are around ten times smaller than the out-of-sample testing case. In out-of-sample testing the presence of noise in the observations causes a deviation regardless of the quality of the estimator or the number of observations. The DDF direction corresponding to the smallest MSE is the direction orthogonal to the true function (i.e., for our DGP). This direction provides the shortest distance from the observations to the true function. We conclude that, in this experiment, it is preferable to select a direction orthogonal to the true function (see Section 5 for further experiments).
From the fixed noise direction experiments (see B.1), we observe that using a direction for the estimator that matches the direction used for the noise generation significantly reduces the MSE values compared to the true function. From this, we infer that when endogeneity is severe, using a direction that matches the characteristics of this endogeneity significantly improves the fit of the estimator; i.e., the MSE is smaller for the matching direction than for the second best direction in of the cases (see Section 5 for the details).
Finally, we need to solve the problem of evaluating alternative directions when the true function is unknown so that we can evaluate alternative directions in the application data. Below, we describe our proposed alternative measure of fit.
4.2 Radial MSE Measure
MSE is typically measured by the average sum of squared errors in the dimension of a single variable, such as cost or output. As explained in Section 4.1, when we compare out-of-sample performance, we find that the best direction to use in estimating a SDDF is the direction used for MSE evaluation regardless of the direction of noise in the DGP or any other characteristics of the DGP. To avoid this relationship between the direction of estimation and the direction of evaluation, we propose a radial MSE measure.
We begin by normalizing the data to a unit cube and consider a case of outputs and observations, where the original observations are:
[TABLE]
The normalized observations are:
[TABLE]
Our radial MSE measure is the distance from the testing set observation to the estimated function measured along a ray from the testing set observations to the center . Having normalized the data, the center for the radial measure is
The radial MSE measure is the average of the distance from each testing set observation to the estimated function measured radially. Figure 5 illustrates this measure. For a convex function, a radial measure reduces the bias in the measure for extreme values in the domain.
5 Monte Carlo Simulations
We next examine how different DGPs affect the optimal direction for the DDF estimator based on a set of Monte Carlo simulations. We consider both random noise directions for each observation and a fixed noise direction representing a high endogeneity case. We consider the effects of the different variance levels for the noise and changes in the underlying distribution of the production data. Using the simplest case of two outputs and a fixed cost level for all observed units allows us to separate the effects of the data and of the function.
5.1 CNLS-d Formulation for Cost Isoquant Estimation
Before describing our experiments, we first outline the CNLS-d for estimating the iso-cost level set. It is based on the following optimization problem:
[TABLE]
Note all observations, , have a common cost level. This allows us to focus on a 2-dimensional estimation problem. For results related to 3-dimensional estimation problems see B.2, Experiment 6.
We can recover the fitted values, , and the coefficient, , using:
[TABLE]
5.2 Experiments
We conducted several experiments to investigate the optimal direction for the DDF estimator. Four experiments’ results are shown in the main text of the paper with two additional experiments described in the appendix.
Experiment 1 - Base case: A two output circular isoquant with uniformly distributed angle parameters and random noise direction
For the base case, we consider a fixed cost level and approximate a two output isoquant; i.e., . Indexing the outputs by and observations by , we generate the output variables as:
[TABLE]
where is the observation on the isoquant and is the noise. We generate the output levels as:
[TABLE]
where , is drawn randomly from a continuous uniform distribution, . The noise terms, , have the following expressions:
[TABLE]
where the length is drawn from the normal distribution , the angle is observation specific and characterizes the noise direction for each observation, and is drawn from a continuous uniform distribution . The values considered for the directions in CNLS-d estimator are . The standard deviation of the normal distribution is . We perform the experiment times for each parameter setting.
Table 3 reports the radial MSE values from a testing set of observations lying on the true function.
As shown in Table 3, the angle corresponding to the smallest MSE (shown in bold) is the one that gives an orthogonal direction to the center of the true function, , and that the MSE values differ significantly, increasing at similar rates as the direction angle deviates from in either direction.
Experiment 2 - The base case with fixed noise directions
In this experiment, , which characterizes the noise direction for each observation, is constant for all observations, . The values used for and the directions in CNLS-d estimator are the same, . The standard deviation of the normal distribution is again . We perform the experiment times for each parameter settings. Table 4 reports the results.
Each row in the Table 4 corresponds to a different noise direction in DGP. The bold numbers identify the directions in CNLS-d estimator that obtain the smallest MSE for each noise direction. We confirm our previous insight, from the parametric estimator and fixed noise direction case described in B.1, that the bold values appearing on the diagonal (from the upper-left to the lower-right of Table 4) correspond to the directions used in CNLS-d. This result indicates that selecting the direction in the SDDF that matches the underlying noise direction in the DGP results in improved functional estimates.
Experiment 3. Base case with fixed noise direction and different noise levels
In Experiment 3, we vary the noise term by changing the coefficient. Table 5 reports the results for .
In Table 5 (Experiment 3, with ), we do not observe the same diagonal pattern observed in Experiment 2, and the best direction for CNLS-d estimator does not match the direction selected for the noise. This leads us to hypothesize that when the noise level is small, data characteristics, such as the distribution of the regressors or the shape of the function, affect the estimation whereas when the noise level is large, regressors’ relative variability becomes a more dominant factor in determining the best direction for the CNSL-d estimator.
However, with the results of Experiment 3 are consistent with those from Experiment 2; i.e., the best direction always coincides with the noise direction selected. The results of Experiment 3 with are reported in B, Table 15 (Experiment 3 with ).
Experiment 4: Base case with different distributions for the initial observations on the true function
In Experiment 4, we seek to understand how changing the DGP for the angle, , affects the optimal direction. We consider the three normal distributions with different parameters: , and . We truncate the tails of the distribution so that the generated angles fall in the range . Noise is specified as in Experiment 1. Table 6 reports the results of this experiment.
In Table 6, we observe that selecting a direction in the SDDF to match , the mean of the distribution for the angle variable used in the DGP, corresponds to the smallest MSE value. This result suggests that the estimator’s performance improves when we select a direction that points to the “center” of the data.
B.2 presents additional experiments, varying the distribution of the observations and considering three outputs with a fixed costed level. These experiments lend further support to the strategy of selecting a direction pointed to the “center” of the data.
6 Proposed Approach to Direction Selection
Based on Monte Carlo simulations, we found that the optimal direction depends on the shape of the function and the distribution of the observed data. This of itself is not surprising. However, by assuming a unimodal distribution for the data generation process, a direction that aims towards the “center” of the data and is perpendicular to the true function at that point tends to outperform other directions. To apply this finding for a data set with outputs and observations, , we suggest selecting the direction for the DDF as follows:
Normalize the data:
(24)
(25)
Select the direction:
(26)
This provides a method for direction selection that can be used in applications when the true direction is unknown.121212A cost function is convex with respect to the point . Therefore, to have a ray that points from the point to the median of the data, the directional vector is needed. We test the proposed method by estimating a cost function for a US hospital data set.
7 Cost Function Estimation of the US Hospital Sector
We analyze the cost variation across US hospitals using a conditional mean estimate of the cost function. We estimate a multi-output cost function for the US hospital sector by implementing our data-driven method for selecting the direction vector for the DDF. We report most productive scale size and marginal cost estimates.
7.1 Description of the Data Set
We obtain cost data from the American Hospital Association’s (AHA) Annual Survey Databases from 2007 to 2009. The costs reported include payroll, employee benefits, depreciation, interest, supply expenses and other expenses. We estimate a cost function which can be interpreted as a distance function with a single input when hospitals face the same input prices131313Unfortunately we do not observe input prices. We chose to estimate a cost function and make the assumption of common input prices rather than impose an arbitrary division of the cost.. We obtain hospital output data from the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample (NIS) core file that captures data annually for all discharges for a 20% sample of US community hospitals. The hospital sample changes every year. For each patient discharged, all procedures received are recorded as International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) codes. The typical hospital in the US relies on these detailed codes to quantify the medical services it provides (Zuckerman et al. (1994)). We map the codes to four categories of procedures, specifically the procedure categories are “Minor Diagnostic,” “Minor Therapeutic,” “Major Diagnostic,” and “Major Therapeutic” which are standard output categories in the literature (Pope and Johnson (2013)). The number of procedures is each category are summed for each hospital by year to construct the output variables. The total number of hospitals sampled is around 1,000 per year from 2007 to 2009.141414The NIS survey is a stratified systematic random sample. The strata criteria are urban or rural location, teaching status, ownership, and bed size. This stratification ensures a more representative sample of discharges than a simple random sample would yield. For details see https://www.hcup-us.ahrq.gov/tech_assist/sampledesign/508_compliance/508course.htm#{463754B8-A305-47E3-B7EE-A43953AA9478}. However, mapping between the two databases is only possible for approximately 50% of the hospitals in the HCUP data, resulting in approximately 450 to 525 observations available each year.
7.2 Pre-Analysis of the Data Set
7.2.1 Testing the Relevance of the Regressors
We begin by testing the statistical significance of our four output variables, , for predicting cost. While the variables selected have been used in previous studies, we use these tests to evaluate whether this variable specification can be rejected for the current data set of U.S. hospitals from 2007-2009.
The null hypothesis stated for the th output is:
[TABLE]
against:151515Where the notation implies the vector excluding the th component.
[TABLE]
We implement the test with a Local Constant Least Squares (LCLS) estimator described in Henderson and Parmeter (2015), calculating bandwidths using least-squares cross-validation. We use 399 wild bootstraps. We found that all output variables were highly statistically significant for all years.
7.3 Results
CNLS-d and Different Directions
We analyze each year of data as a separate cross-section because, as noted above, the HCUP does not track the same set of hospitals across years. To illuminate the direction’s effect on the functional estimates, we graph “Cost” as a function of “Major Diagnostic Procedures” and “Major Therapeutic Procedures” holding “Minor Diagnostic Procedures” and “Minor Therapeutic Procedures” constant at their median values. Figure 6 illustrates the estimates for three different directions, one with only a cost component, one with only a component in Major Therapeutic Procedures, and one that comes from our median approach. Visual inspection indicates that the estimates with different directions produce significantly different estimates, highlighting the importance of considering the question of direction selection.
We compare the estimator’s performance when using different directions. Table 8 reports the MSE for three sample directions in each year. We define our direction vector as .161616We focus on types of directions found to be competitive in our Monte Carlo simulations.
We pick two directions, one with equal components in all dimensions, and a second direction that has a cost component that is double the value of the output components. The median vector is , which is very close to the cost-only direction. The MSE varies by 15-30% over the different directions. We observe that there is no clear dominant direction; however, the median direction performs reasonably well in all cases. We conclude that as long as a direction with non-zero components for all variables that could contain noise is selected, then the precise direction selected is not critical to obtaining improved estimation results.
Comparison with other estimators
We compare three methods to estimate a cost function: 1) a quadratic functional form (without the cross-product terms), Färe et al. (2010); 2) CNLS-d with the direction selection method proposed in Section 6; and 3) lower bound estimate calculated using a local linear kernel regression with a Gaussian kernel and leave one-out cross-validation for bandwidth selection, Li and Racine (2007).171717For CNLS-d, we select a value for an upper bound through a tuning process, , and impose the upper bound on the slope coefficients estimated (Lim, 2014). We select these estimators because a quadratic functional form to model production has been used in recent productivity and efficiency analysis of healthcare. See, for example, Ferrier et al. (2018). The local linear kernel is selected because it is an extremely flexible nonparametric estimator and provides a lower bound for the performance of a functional estimate. However, note that the local linear kernel does not satisfy standard properties of a cost function; i.e., cost is monotonic in output and marginal costs are increasing as output increases.
We will use the criteria of K-fold average MSE with to compare the approaches. This means we split the data equally into 5 parts. We use 4 of the 5 parts for estimation (training) and evaluate the performance of the estimator on the 5th part (testing). We do this for all 5 parts and average the results. The values presented in Table 9 correspond to the average across folds.
While the average MSEs for all years are lowest for the lower bound estimator, CNLS-d performs relatively well as it is close to the lower bound in terms of fitting performance while imposing standard axioms of a cost function. As is true of most production data, the hospital data are very noisy. The shape restrictions imposed in CNLS-d improves the interpretability. The CNLS-d estimator outperforms the parametric approach, indicating the general benefits of nonparametric estimators.
Description of Functional Estimates - MPSS and Marginal Costs
We report the most productive scale size (MPSS) and the marginal costs for the a quadratic parametric estimator, the CNLS-d estimator with our proposed direction selection method, and an alternative.181818Here most productive scale size is measured on each ray from the origin (fixing the output ratios) and is defined as the cost level that maximizes the ratio of aggregate output to cost. Marginal cost is measured on each ray from the origin (fixing the output ratios) and is defined as the cost to increase aggregate output by one unit. These metrics are determined on the averaged K-fold estimations for each estimation method. For the MPSS, we present the cost levels obtained for different ratios of Minor Therapeutic procedures (MinTher) and Major Therapeutic procedures (MajTher), with the minor and major diagnostics held constant at their median levels.
MPSS results are presented in Table 10 and the values for CNLS-d (Median Direction) are illustrated in Figure 7. We observe small variations across both years and estimators. The differences across years are in part due to the sample changing across years. Most hospitals are small and operate close to the MPSS. However, there are several large hospitals that are operating significantly above MPSS. Hospitals might choose to operate at larger scales and provide a large array of services allowing consumers to fulfill multiple healthcare needs.
For marginal costs, we present the values for different percentiles of the MinTher and MajTher, with the minor and major diagnostics held constant at their median levels. A more exhaustive comparison across all outputs is presented in C. Marginal cost information can be used by hospital decision makers to select the types of improvements that are likely to result in higher productivity with minimal cost increase. For example, consider a hospital that is in the percentile of the data set for all four outputs in 2008 and the hospital manager has the option to expand operations for either minor or major diagnostic procedures. Results reported in Tables 11 and 12 indicate that an increase of 1 minor therapeutic procedures would result in a \4.9k$7.7k$ increase in cost. A decision maker would want to consider the revenue generated by the different procedures; however, these estimates provide insights regarding the incremental cost of additional major and minor therapeutic procedures.
CNLS-d is the most flexible of the estimators and allows MPSS values to fluctuate significantly across percentiles. CNLS-d does not smooth variation, rather it minimizes the distance from each observation to the shape constrained estimator. In C, results for the local linear kernel estimator are also presented. Even though the local linear kernel bandwidths are selected via cross-validation, relatively large values are selected due to the relatively noisy data and the highly skewed distribution of output. These large bandwidths and the parametric nature of the quadratic function make these two estimators relatively less flexible compared to CNLS-d. A feature of performance that is captured only by CNLS-d is that, hospitals specializing in either minor or major therapeutics maximize productivity at a larger scales of operation as illustrated in Figure 7.
The marginal cost results for Minor Therapeutic procedures are presented in Table 11 and Figure 8 (left) and the marginal cost results for Major Therapeutic procedures are reported in Table 12 and Figure 8 (right). As was the case for MPSS (see Table 10), CNLS-d is more flexible and its marginal cost estimates vary significantly across percentiles. The CNLS-d with different directions provides very similar marginal costs estimates. However, the CNLS-d estimates differ significantly from the marginal cost estimates obtained with the parametric estimator. For CNLS-d the marginal costs results are in line with the theory that marginal costs are increasing with scale. This property can also be violated if using a non-parametric estimator without any shape constraints imposed. For example this can be seen in the marginal costs of minor therapeutic procedures for the parametric (quadratic) regression estimator, Figure 8.
Our data set, which combines AHA cost data with AHRQ output data for a broad sample of hospitals from across the US, is unique to the best of our knowledge. However, the marginal cost estimates are broadly in line with marginal cost estimates for US hospitals for similar time periods. Gowrisankaran et al. (2015) studied a considerably smaller set of Northern Virginia hospitals observed in 2006 that, on average, were larger that hospitals in our data set. Due to the differences in the measures of output the marginal cost levels are not directly comparable. However, conditional on the size variation, the variation in marginal costs is similar to the variation we observe for the parametric (quadratic) regression specification applied to our data. Boussemart et al. (2015) analyzed data on nearly 150 hospitals located in Florida observed in 2005. The authors use a different output specification and a translog model; however, their distribution of hospital size is similar to our data set and we observe similar variances in marginal costs with the parametric (quadratic) regression specification applied to our data.
8 Conclusions
This paper investigated the improvement in functional estimates when specifying a particular direction in CNLS-d. Based on Monte Carlo experiments, two primary findings emerged from our analysis. First, directions close to the average orthogonal direction to the true function performed well. Second, when the data are noisy, selecting a direction that matched the noise direction of the DGP improves estimator performance. Our simulations indicate that CNLS-d with a direction orthogonal to the data is preferable if the noise level is not too large and that a direction that matches the noise direction of the DGP is preferred if the noise level is large. Thus, if users know the shape of the data or the characteristics of the noise, they can use CNLS-d with a direction orthogonal to the data if the noise coefficient is small. Or if the noise coefficient is large, the user can select a direction close to the true noise direction, with non-zero components in all variables that potentially have noise. Our application to US hospital data shows that CNLS-d performs similarly across different directions that all include non-zero components of the direction vector for variables that potentially have noise in their measurement.
In future research, we propose developing an alternative estimator that incorporates multiple directions in CNLS-d while maintaining the concavity axiom. This would permit treating subgroups within the data, allowing different assumptions to be made across subgroups (e.g., for-profit vs. not-for-profit hospitals).
Appendix A Properties of Directional Distance Functions and CNLS-d
A.1 Direction Selection in Directional Distance Functions
In this appendix we prove that the direction vector affects the functional estimates. Let , then we can state the following theorem:
Theorem 1**.**
Suppose that two direction vectors exist, and , such that . Then the directional distance function estimates using these two different directions are not equal, .
Proof.
Rewrite Problem (10) from Section 3.2 as
[TABLE]
Observe that all decision variables appear in the objective function and that the objective function is a quadratic function while the constraints define a convex solution space; i.e., this optimization problem has a unique solution (Bertsekas (1999)). If we solve Problem (27) with , then the resulting solution vector is . Changing the direction vector from to the normalization constraint no longer holds for and . However, the previous argument holds for the uniqueness of . Thus, .
∎
A.2 Details of CNLS-d
An alternative expression for CNLS-d (cf. equations (16)-(16c) from Section 5.1) is given by:
[TABLE]
It’s possible to recover , and the final estimates using the following relations:
[TABLE]
A.3 Different Directions for Different Groups in CNLS-d
Consider the case where all observations have the same input level and produce two outputs and estimate the isoquant. Define two groups of observations and such that and .191919The notation corresponds to the cardinality of the set. Using the notation in A.1, the direction vector for the first group of observations is and it’s for the second group of observations .
For either a fixed input vector, , or a fixed cost level, , formulate the iso-cost estimator for and with different directions vectors as:
[TABLE]
Note that using more than one direction for CNLS-d can lead to violations on convexity. Only under very limiting conditions can we allow for multiple directions in CNLS-d and guarantee that the resulting estimated function will maintain convexity. The following theorem formalizes the conditions.
Theorem 2**.**
If a CNLS-d estimator is calculated using two groups of observations with different direction vectors as shown in Equation (32) and the following condition holds regarding the direction vectors and the noise direction:
[TABLE]
where
[TABLE]
then the resulting CNLS-d estimate is a concave function.
Proof.
Consider the Afriat inequalities in the context of cost isoquant estimation. One of the conditions of Equation (16) is:
[TABLE]
Knowing that means that .
Substituting and in the inequalities (34) obtains:
[TABLE]
Next, consider the case where both observations have the same direction. Then the expression is:
[TABLE]
If Equation (36) is satisfied, we know that the CNLS-d constraints hold. By comparison observe that the condition listed below is a sufficient condition for Equation (36) being satisfied when Equation (35) holds:
[TABLE]
[TABLE]
[TABLE]
which, after simplifying, becomes:
[TABLE]
∎
Thus Theorem 2 is proved and a sufficient condition is found that, if verified, ensures the concavity property of the estimator even when multiple directions are used in the estimation of the directional distance function.
The following corollary, concerning the convex case, is directly inferred from Theorem 2:
Corollary 1**.**
If a CNLS-d estimator is calculated using two groups of observations with different direction vectors as shown in Equation (32), and the following condition holds regarding the direction vectors and the noise direction:
[TABLE]
where
[TABLE]
then the resulting CNLS-d estimate is a convex function.
Proof.
Reverse the inequality sign in Equation (34):
[TABLE]
and follow the logic of the proof of Theorem 2 to obtain Corollary 1 and Equation (38).
∎
Theorem 2 clarifies that if the directions for each respective group are orthogonal to each other, then condition 33 is verified. This means that if the direction for group 1 has a single nonzero component in the output 1 dimension and group 2 has a single nonzero component in the output 2 dimension, then we will not observe violations of the convexity property.
We state a second Corollary that follows from Theorem 2, which is useful when there are more than two groups each with their own estimation direction in CNLS-d.
Corollary 2**.**
Let the total number of observation. Let the number of outputs considered. Let the set of observed outputs. Let a partition of of cardinal . Let the set of directions used for each respective group of the partition. If a CNLS-d estimator is calculated using the directions from based on partition , and the following condition holds regarding the direction vectors and the noise direction:
[TABLE]
where corresponds to the indicator of the part of the partition , in which belongs. Then the resulting CNLS-d estimate is a concave function.
Proof.
We can follow the proof of Theorem 2, as the condition does not change. The condition still concerns pairwise observations, the only difference is that now the partition of observations corresponds to more than two groups. This does not affect the proof of the condition.
∎
Corollary 2 extends the statement of Theorem 2 to provide sufficient conditions to avoid violations of the shape constraints in a scenario where there are more than two groups each with their own estimation direction in CNLS-d estimation.
Simulations to investigate the frequency with which multiple directions leads to violations
We run simulations to investigate the effects of using multiple directions. We use the same DGP as stated in Section 5, Example 1. However, we define two groups and assign different directions for each one of them:
[TABLE]
and,
[TABLE]
where and .
We run a total of simulations. For comparison, for each simulation, we also record the estimates when using only the direction based on and only for all observations. We identify violations of the monotonicity and concavity by sorting the estimates by . We identify all adjacent pairs and triplets, which means 99 pairs and 98 triplets given that we consider 100 observations for each simulation.
As expected, there are no violations when we use a single direction for the estimation. However, when we use two directions violations are observed. For monotonicity, we observe no violations for pairs of observations that are part of the same group. However, for pairs with one member from each group we observe violations of monotonicity for 6% of the pairs. We use the triplets to analyze concavity. When the members of the triplet are from the same group, we observe violations of concavity for 2% of the triplets. When one member of the triplet is from a different group, the violations of concavity increase to 45%. These results indicate that for one instance when the conditions of Theorem 2 do not hold, we see a significant number of violations of the maintained assumptions.
Appendix B Additional Experiments
B.1 Experiments Related to Section 4.1 - with the Linear Estimator.
Measuring MSE Example, Section 4.1 - Noise Generated in a Common and Prespecified Direction
This section describes the simulations and the results for the fixed noise direction case referenced in Section 4.1.
The Data Generation Process (DGP) for observations , is as follows:
The output, , is drawn from the continuous uniform distribution
The cost is calculated as , where .
In the case of fixed direction, the noise term is determined as:
(a)
is the scalar length that is drawn from a normal distribution, , is prespecified and an initial value for the standard deviation, , is calculated as in Equation (11) in Section 4.1.:
(44)
where and are the mean of the output and the mean of the cost without noise, respectively.
(b)
is the fixed noise direction that is inferred from the prespecified angle .
(c)
The observations with noise are obtained by appending the noise term:
(45)
Apply the DGP described above to generate a training set, , and a testing set . Consider repetitions of the simulation and set the number of observations in each group to . Set the scaling coefficient for the noise to . Consider different DGP since data is generated for the following values of noise direction angles, .
We test the set of directions corresponding to the angle . If the direction of the noise, , matches the direction used in the DDF, , then the smallest MSE results for all cases.
Results: Fixed Noise Direction
Table 13 reports the MSE computed by comparing the estimated function to the true function and Table 14 reports the MSE computed by comparing the estimated function to the testing set.
In Table 13, the direction for the DDF corresponding to the smallest MSE always matches the noise direction in the DGP. Further for more than of the cases tested there is more than a decrease in MSE by using the correctly specified direction compared to the next best direction tested, which was not as large in the random direction case in Table 1 of Section 4.1. In other words, when endogeneity is severe, the benefits of using a DDF with a well-selected direction are potentially large.
Table 14 is consistent with the results observed in the random noise case, in Table 2 of Section 4.1. The DDF directions corresponding to the smallest MSE values are those matching the directions used for the MSE computation. Thus, the proposed radial MSE measure addresses the challenge of measuring performance in applications with a testing dataset.
Monte Carlo Simulations - Experiments, Section 5.2 - Experiment 3. Base case with fixed noise direction and different noise levels
This section summarizes the results of Experiment 3 with .
B.2 Experiments related to Section 5.2 - with CNLS-d
Here we complete Section 5.2 with additional experiments and we follow the numbering experiments numbering established then.
Experiment 5: Base case with different distributions for the initial observations on the true function
In Experiment 5, we extend the analysis performed in Experiment 4. We consider additional distributions of the DGP for the angle, and see how it affects the optimal direction. Unlike Experiment 4, we don’t consider only normal distributions, instead we consider the following: a normal distribution, , and two gamma distributions, and . For the gamma distributions, the first parameter corresponds to the shape coefficient and the second the scale coefficient. Each distribution is later referenced respectively as , and . We truncate the tails of the distribution so that the generated angles fall within the range . Noise is specified as in Experiment 1. In Figure 10, the distributions of the angles are illustrated and in particular the median values are highlighted. Table 16 reports the results of this experiment.
Two main conclusion can be drawn from the results in Table 16. First, the smaller the variance of the data distribution, the greater is the importance of direction selection. Looking at the differences between the two gamma distributions, has a larger tail than , which means the observations for have a smaller variance. Table 16 indicates that the MSE increases rapidly with deviations from the optimal direction when variance of observations is smaller as with compared to . Second, among the directions tested, , MSE is minimized for the direction closest to the direction corresponding to the median of the distribution. This second point supports the selection approach proposed in Section 6.
Experiment 6: Adaptation of the Base Case to a 3-Dimensional Case
We adapt the DGP from Experiment 1, the base case. We consider a fixed input level and approximate a three output isoquant, . Indexing the outputs by and observations by , we define the outputs,
[TABLE]
where is the observation on the isoquant and is the noise. The output levels are generated:
[TABLE]
where , are drawn randomly from a continuous uniform distribution, .
The noise terms is adapted to the 3-dimensional isoquant:
[TABLE]
where the length is drawn from the normal distribution , and for which are drawn from a continuous uniform distribution .
In Experiment 6, 19 directions are considered for the CNLS-d estimators. The directions are determined using the following steps:
enumerate all 3 component vectors, corresponding to with elements from the set and excluding ; 2. 2.
normalize the direction vectors dividing them by their respective Euclidean norms; 3. 3.
eliminate duplicates
The 19 directions are represented by the markers in Figure 11 and create a balanced grid on the eighth of a unit sphere, our isoquant. The median direction is . The standard deviation of the normal distribution is . We perform this experiment times for each direction. We report the averaged radial MSE values on a testing set of observations lying on the true function in Table 17. In addition to the table, the MSE results are also illustrated in Figure 11 where the size of the markers has a positive affine relation with the MSE values and that in the color range from yellow to red, with larger the MSE values associated with more red markers.
We can establish three categories of directions that correspond to certain ranges of MSE values. The first category corresponds to the worst MSE values, which are almost twice the smallest values. These are the directions that have only one non-zero component shown with red markers on the corners of the surface shown in Figure 11. The second category is for the MSE values that are above but less than . These directions are labeled with the orange markers in Figure 11 that are on the edges of the surface but not the corners. One of their directional components, , is zero but all others are not. The third category of directions, which has the smallest MSEs, correspond to the yellow markers in Figure 11. These directions have only positive components. Thus, we observe a trend that the directions that have positive components in all variables correspond to the best MSE values. The median value direction, , is among the yellow markers. These results support the selection approach proposed in Section 6 and confirm the results obtained on the US hospitals data set.
Appendix C U.S. Hospital Dataset Application
We describe the functional estimates provided by quadratic regression, CNLS-d using a direction with equal components in all dimensions and CNLS-d using the median direction, and the local linear kernel. Table 18 provides most productive scale size (MPSS) measurements in cost in \M. Tables [19](#A3.T19) and [20](#A3.T20) provide the marginal cost of Minor Therapeutic procedures and the marginal cost of Major Therapeutic procedures, respectively. The units for Tables [19](#A3.T19) and [20](#A3.T20) are cost in $k$ over Minor and Major Therapeutic procedures, respectively.
Our conclusions are the same as stated in the body of the paper, CNLS-d provides the advantage of being more flexible than the parametric estimator (quadratic regression) while having shape constraints that maintain the interpretability of the results.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ackerberg et al. (2015) Ackerberg, D. A., Caves, K., Frazer, G., 2015. Identification properties of recent production function estimators. Econometrica 83 (6), 2411–2451.
- 2Adler and Volta (2016) Adler, N., Volta, N., 2016. Accounting for externalities and disposability: a directional economic environmental distance function. European Journal of Operational Research 250 (1), 314–327.
- 3Afriat (1972) Afriat, S. N., 1972. Efficiency estimation of production functions. International Economic Review 13 (3), 568–598.
- 4Aparicio et al. (2017) Aparicio, J., Pastor, J., Zofio, J., 2017. Can Farrell’s allocative efficiency be generalized by the directional distance function approach? European Journal of Operational Research 257 (1), 345–351.
- 5Atkinson et al. (2003) Atkinson, S., Cornwell, C., Honerkamp, O., 2003. Measuring and decomposing productivity change: stochastic distance function estimation versus data envelopment analysis. Journal of Business & Economic Statistics 21 (2), 284–294.
- 6Atkinson and Tsionas (2016) Atkinson, S., Tsionas, M., 2016. Directional distance functions: optimal endogenous directions. Journal of Econometrics 190 (2), 301–314.
- 7Baležentis and De Witte (2015) Baležentis, T., De Witte, K., 2015. One- and multi-directional conditional efficiency measurement: efficiency in Lithuanian family farms. European Journal of Operational Research 245 (2), 612–622.
- 8Bertsekas (1999) Bertsekas, D. P., 1999. Nonlinear programming. Athena Scientific, Belmont, MA.
