An Approximate Bayesian Approach to Model-assisted Survey Estimation with Many Auxiliary Variables
Shonosuke Sugasawa, Jae Kwang Kim

TL;DR
This paper introduces a Bayesian regularized regression method for model-assisted survey estimation with many auxiliary variables, providing efficient estimates and credible intervals, demonstrated through simulation studies.
Contribution
It proposes a novel Bayesian approach using shrinkage priors for variable selection in survey estimation with numerous auxiliary variables.
Findings
The method yields efficient point estimates.
It provides credible intervals with good coverage.
Simulation results compare favorably with frequentist methods.
Abstract
Model-assisted estimation with complex survey data is an important practical problem in survey sampling. When there are many auxiliary variables, selecting significant variables associated with the study variable would be necessary to achieve efficient estimation of population parameters of interest. In this paper, we formulate a regularized regression estimator in the framework of Bayesian inference using the penalty function as the shrinkage prior for model selection. The proposed Bayesian approach enables us to get not only efficient point estimates but also reasonable credible intervals. Results from two limited simulation studies are presented to facilitate comparison with existing frequentist methods.
| (A) | (B) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | 20 | 30 | 40 | 50 | 20 | 30 | 40 | 50 | |||
| GREG | 11.7 | 11.8 | 12.0 | 12.3 | 11.4 | 11.8 | 12.1 | 12.3 | |||
| GREG-L | 11.7 | 11.7 | 11.7 | 11.8 | 11.1 | 11.1 | 11.1 | 11.1 | |||
| GREG-R | 11.8 | 11.9 | 12.1 | 12.4 | 11.4 | 11.6 | 11.8 | 12.0 | |||
| GREG-V | 11.6 | 11.7 | 11.8 | 12.0 | 11.3 | 11.5 | 11.8 | 12.0 | |||
| MSE | GREG-M | 11.7 | 11.8 | 12.0 | 12.3 | 11.4 | 11.8 | 12.1 | 12.3 | ||
| AB | 11.7 | 11.9 | 12.1 | 12.4 | 11.6 | 11.9 | 12.2 | 12.5 | |||
| ABL | 11.7 | 11.8 | 11.9 | 12.2 | 11.4 | 11.7 | 11.8 | 12.0 | |||
| ABH | 11.6 | 11.6 | 11.6 | 11.8 | 11.2 | 11.3 | 11.3 | 11.4 | |||
| HT | 17.5 | 17.5 | 17.5 | 17.5 | 14.8 | 14.8 | 14.8 | 14.8 | |||
| GREG | 0.21 | 0.12 | 0.13 | 0.23 | 0.54 | 1.24 | 1.87 | 2.41 | |||
| GREG-L | 0.19 | 0.16 | 0.18 | 0.19 | 0.00 | 0.11 | 0.20 | 0.26 | |||
| GREG-R | 0.22 | 0.16 | 0.18 | 0.31 | 0.56 | 1.21 | 1.79 | 2.32 | |||
| GREG-V | 0.16 | 0.05 | 0.08 | 0.17 | 0.29 | 0.80 | 1.26 | 1.64 | |||
| Bias | GREG-M | 0.21 | 0.12 | 0.13 | 0.23 | 0.54 | 1.24 | 1.87 | 2.41 | ||
| AB | 0.19 | 0.10 | 0.11 | 0.22 | 0.60 | 1.28 | 1.92 | 2.44 | |||
| ABL | 0.19 | 0.11 | 0.11 | 0.21 | 0.49 | 1.06 | 1.55 | 1.95 | |||
| ABH | 0.16 | 0.12 | 0.11 | 0.17 | 0.06 | 0.29 | 0.51 | 0.71 | |||
| HT | 0.78 | 0.78 | 0.78 | 0.78 | -1.08 | -1.08 | -1.08 | -1.08 | |||
| (A) | (B) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | 20 | 30 | 40 | 50 | 20 | 30 | 40 | 50 | |||
| GREG | 92.8 | 92.8 | 92.7 | 89.9 | 94.2 | 92.1 | 92.1 | 90.1 | |||
| GREG-L | 93.5 | 93.4 | 93.2 | 93.3 | 94.5 | 94.8 | 94.4 | 94.8 | |||
| GREG-R | 93.0 | 92.4 | 91.8 | 90.0 | 93.3 | 92.4 | 91.9 | 90.4 | |||
| GREG-V | 93.6 | 93.7 | 93.3 | 91.4 | 94.1 | 93.8 | 92.5 | 91.2 | |||
| CP | GREG-M | 93.9 | 93.9 | 93.9 | 92.9 | 94.5 | 93.7 | 93.8 | 92.9 | ||
| AB | 95.3 | 94.8 | 94.9 | 94.2 | 95.1 | 94.8 | 94.9 | 95.2 | |||
| ABL | 95.2 | 94.6 | 94.8 | 94.5 | 95.3 | 95.3 | 95.1 | 94.9 | |||
| ABH | 94.8 | 95.0 | 95.0 | 94.7 | 95.4 | 95.9 | 95.1 | 95.5 | |||
| HT | 94.5 | 94.5 | 94.5 | 94.5 | 95.2 | 95.2 | 95.2 | 95.2 | |||
| GREG | 43.1 | 42.3 | 41.5 | 40.7 | 43.1 | 42.3 | 41.5 | 40.7 | |||
| GREG-L | 43.8 | 43.7 | 43.6 | 43.5 | 43.3 | 43.1 | 42.9 | 42.8 | |||
| GREG-R | 43.2 | 42.5 | 41.9 | 41.4 | 42.8 | 42.0 | 41.3 | 40.7 | |||
| GREG-V | 43.4 | 42.8 | 42.2 | 41.6 | 43.4 | 42.9 | 42.3 | 41.8 | |||
| AL | GRREG-M | 44.2 | 44.2 | 44.3 | 44.4 | 44.3 | 44.4 | 44.6 | 44.8 | ||
| AB | 45.8 | 46.3 | 46.8 | 47.3 | 46.2 | 47.0 | 47.8 | 48.7 | |||
| ABL | 45.6 | 45.9 | 46.1 | 46.3 | 45.8 | 46.4 | 46.8 | 47.3 | |||
| ABH | 45.1 | 45.2 | 45.2 | 45.1 | 45.2 | 45.4 | 45.4 | 45.6 | |||
| HT | 66.4 | 66.4 | 66.4 | 66.4 | 59.1 | 59.1 | 59.1 | 59.1 | |||
| (A) | (B) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | 20 | 30 | 40 | 50 | 20 | 30 | 40 | 50 | |||
| GR | 2.24 | 2.29 | 2.32 | 2.36 | 2.32 | 2.39 | 2.50 | 2.57 | |||
| GRL | 2.17 | 2.18 | 2.19 | 2.20 | 2.27 | 2.29 | 2.31 | 2.30 | |||
| GRR | 2.22 | 2.26 | 2.29 | 2.31 | 2.32 | 2.38 | 2.44 | 2.49 | |||
| RMSE | AB | 2.23 | 2.26 | 2.28 | 2.30 | 2.31 | 2.37 | 2.45 | 2.50 | ||
| ABL | 2.21 | 2.23 | 2.24 | 2.25 | 2.27 | 2.28 | 2.26 | 2.23 | |||
| ABH | 2.18 | 2.20 | 2.23 | 2.26 | 2.26 | 2.27 | 2.28 | 2.32 | |||
| HT | 2.80 | 2.80 | 2.80 | 2.80 | 2.83 | 2.83 | 2.83 | 2.83 | |||
| GR | -0.10 | -0.12 | -0.12 | -0.11 | 0.10 | 0.18 | 0.31 | 0.43 | |||
| GRL | -0.11 | -0.11 | -0.10 | -0.11 | 0.03 | 0.05 | 0.07 | 0.08 | |||
| GRR | -0.11 | -0.12 | -0.12 | -0.12 | 0.07 | 0.13 | 0.20 | 0.27 | |||
| Bias | AB | -0.11 | -0.13 | -0.13 | -0.13 | 0.09 | 0.17 | 0.27 | 0.38 | ||
| ABL | -0.10 | -0.10 | -0.07 | -0.02 | 0.07 | 0.13 | 0.19 | 0.22 | |||
| ABH | -0.10 | -0.11 | -0.10 | -0.11 | 0.01 | 0.03 | 0.04 | 0.03 | |||
| HT | -0.15 | -0.15 | -0.15 | -0.15 | 0.07 | 0.07 | 0.07 | 0.07 | |||
| (A) | (B) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | 20 | 30 | 40 | 50 | 20 | 30 | 40 | 50 | |||
| GR | 92.3 | 90.8 | 88.8 | 86.4 | 91.9 | 90.3 | 87.3 | 84.6 | |||
| GRL | 94.1 | 94.1 | 93.9 | 93.2 | 93.2 | 93.0 | 92.6 | 92.9 | |||
| GRR | 92.8 | 92.1 | 91.0 | 90.6 | 92.0 | 90.8 | 89.6 | 89.0 | |||
| CP | AB | 94.8 | 95.5 | 95.4 | 96.1 | 94.6 | 94.1 | 94.5 | 95.1 | ||
| ABL | 95.1 | 95.7 | 95.9 | 96.5 | 94.6 | 95.2 | 96.6 | 97.2 | |||
| ABH | 95.1 | 96.0 | 96.0 | 96.2 | 95.1 | 95.2 | 95.9 | 96.2 | |||
| HT | 95.3 | 95.3 | 95.3 | 95.3 | 94.5 | 94.5 | 94.5 | 94.5 | |||
| GR | 8.02 | 7.80 | 7.56 | 7.30 | 8.20 | 7.95 | 7.69 | 7.39 | |||
| GRL | 8.21 | 8.17 | 8.14 | 8.11 | 8.42 | 8.37 | 8.33 | 8.30 | |||
| GRR | 8.15 | 7.99 | 7.88 | 7.79 | 8.34 | 8.17 | 8.04 | 7.94 | |||
| AL | AB | 8.74 | 8.90 | 9.10 | 9.42 | 9.05 | 9.27 | 9.59 | 10.10 | ||
| ABL | 8.79 | 8.99 | 9.24 | 9.55 | 9.07 | 9.31 | 9.61 | 9.99 | |||
| ABH | 8.76 | 8.96 | 9.18 | 9.45 | 9.02 | 9.22 | 9.46 | 9.75 | |||
| HT | 11.14 | 11.14 | 11.14 | 11.14 | 11.00 | 11.00 | 11.00 | 11.00 | |||
| (A) | (B) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | 20 | 30 | 40 | 50 | 20 | 30 | 40 | 50 | |||
| GREG | 10.3 | 10.4 | 10.6 | 10.9 | 9.9 | 10.1 | 10.3 | 10.5 | |||
| GREG-L | 10.2 | 10.2 | 10.2 | 10.3 | 9.6 | 9.6 | 9.6 | 9.6 | |||
| GREG-R | 10.3 | 10.5 | 10.7 | 10.9 | 9.8 | 10.0 | 10.2 | 10.3 | |||
| GREG-V | 10.2 | 10.4 | 10.5 | 10.7 | 9.8 | 9.9 | 10.1 | 10.2 | |||
| RMSE | GREG-M | 10.3 | 10.4 | 10.6 | 10.9 | 9.9 | 10.1 | 10.3 | 10.5 | ||
| AB | 10.3 | 10.5 | 10.7 | 11.0 | 10.0 | 10.2 | 10.5 | 10.6 | |||
| ABL | 10.3 | 10.4 | 10.6 | 10.8 | 9.9 | 10.0 | 10.2 | 10.3 | |||
| ABH | 10.2 | 10.2 | 10.3 | 10.3 | 9.7 | 9.7 | 9.8 | 9.8 | |||
| HT | 14.8 | 14.8 | 14.8 | 14.8 | 12.6 | 12.6 | 12.6 | 12.6 | |||
| GREG | -0.15 | -0.13 | -0.18 | -0.22 | 0.43 | 0.93 | 1.39 | 1.86 | |||
| GREG-L | -0.22 | -0.22 | -0.25 | -0.24 | 0.12 | 0.18 | 0.25 | 0.34 | |||
| GREG-R | -0.18 | -0.18 | -0.22 | -0.26 | 0.44 | 0.94 | 1.39 | 1.87 | |||
| GREG-V | -0.21 | -0.22 | -0.24 | -0.23 | 0.27 | 0.62 | 0.95 | 1.30 | |||
| Bias | GREG-M | -0.15 | -0.13 | -0.18 | -0.22 | 0.43 | 0.93 | 1.39 | 1.86 | ||
| AB | -0.17 | -0.16 | -0.20 | -0.24 | 0.43 | 0.93 | 1.37 | 1.85 | |||
| ABL | -0.19 | -0.18 | -0.22 | -0.25 | 0.38 | 0.80 | 1.17 | 1.56 | |||
| ABH | -0.20 | -0.20 | -0.21 | -0.25 | 0.14 | 0.30 | 0.46 | 0.63 | |||
| HT | -0.29 | -0.29 | -0.29 | -0.29 | -0.39 | -0.39 | -0.39 | -0.39 | |||
| (A) | (B) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | 20 | 30 | 40 | 50 | 20 | 30 | 40 | 50 | |||
| GREG | 93.8 | 92.4 | 91.1 | 89.4 | 94.0 | 92.6 | 91.5 | 91.4 | |||
| GREG-L | 94.2 | 94.0 | 94.1 | 93.9 | 94.9 | 94.4 | 94.8 | 94.5 | |||
| GREG-R | 93.3 | 92.2 | 91.8 | 90.1 | 94.4 | 93.0 | 91.7 | 91.7 | |||
| GREG-V | 94.1 | 93.3 | 92.8 | 90.9 | 94.5 | 94.0 | 92.8 | 92.4 | |||
| CP | GREG-M | 94.0 | 93.0 | 92.8 | 91.6 | 94.9 | 93.7 | 93.5 | 93.5 | ||
| AB | 94.2 | 94.2 | 94.5 | 94.4 | 95.6 | 95.5 | 94.3 | 94.7 | |||
| ABL | 94.2 | 94.3 | 94.9 | 94.9 | 95.6 | 95.0 | 94.5 | 94.6 | |||
| ABH | 94.6 | 94.8 | 94.6 | 94.4 | 95.2 | 95.3 | 95.5 | 95.5 | |||
| HT | 94.2 | 94.2 | 94.2 | 94.2 | 94.8 | 94.8 | 94.8 | 94.8 | |||
| GREG | 37.4 | 36.9 | 36.4 | 35.9 | 37.6 | 37.1 | 36.6 | 36.1 | |||
| GREG-L | 37.9 | 37.8 | 37.7 | 37.7 | 37.7 | 37.6 | 37.5 | 37.4 | |||
| GREG-R | 37.5 | 37.0 | 36.6 | 36.2 | 37.4 | 36.9 | 36.4 | 35.9 | |||
| GREG-V | 37.6 | 37.2 | 36.8 | 36.4 | 37.8 | 37.5 | 37.2 | 36.8 | |||
| AL | GREG-M | 38.1 | 38.1 | 38.2 | 38.2 | 38.4 | 38.5 | 38.6 | 38.7 | ||
| AB | 39.2 | 39.5 | 39.9 | 40.2 | 39.6 | 40.2 | 40.8 | 41.3 | |||
| ABL | 39.0 | 39.2 | 39.5 | 39.6 | 39.4 | 39.8 | 40.2 | 40.5 | |||
| ABH | 38.8 | 38.8 | 38.8 | 38.9 | 39.0 | 39.2 | 39.3 | 39.3 | |||
| HT | 57.5 | 57.5 | 57.5 | 57.5 | 51.1 | 51.1 | 51.1 | 51.1 | |||
| (A) | (B) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | 20 | 30 | 40 | 50 | 20 | 30 | 40 | 50 | |||
| GREG | 1.90 | 1.91 | 1.93 | 1.97 | 1.94 | 1.98 | 2.00 | 2.06 | |||
| GREG-L | 1.86 | 1.87 | 1.87 | 1.87 | 1.90 | 1.91 | 1.91 | 1.92 | |||
| GREG-R | 1.88 | 1.89 | 1.91 | 1.93 | 1.93 | 1.97 | 1.98 | 2.02 | |||
| RMSE | AB | 1.89 | 1.89 | 1.91 | 1.93 | 1.93 | 1.96 | 1.98 | 2.01 | ||
| ABL | 1.88 | 1.88 | 1.89 | 1.90 | 1.91 | 1.91 | 1.89 | 1.88 | |||
| ABH | 1.87 | 1.87 | 1.88 | 1.89 | 1.88 | 1.87 | 1.86 | 1.87 | |||
| HT | 2.36 | 2.36 | 2.36 | 2.36 | 2.39 | 2.39 | 2.39 | 2.39 | |||
| GREG | -0.03 | -0.03 | -0.03 | -0.05 | -0.05 | 0.01 | 0.08 | 0.19 | |||
| GREG-L | -0.02 | -0.01 | -0.01 | -0.02 | -0.12 | -0.11 | -0.10 | -0.09 | |||
| GREG-R | -0.03 | -0.02 | -0.02 | -0.04 | -0.07 | -0.02 | 0.02 | 0.09 | |||
| Bias | AB | -0.04 | -0.03 | -0.03 | -0.06 | -0.06 | 0.00 | 0.06 | 0.15 | ||
| ABL | -0.03 | -0.02 | -0.01 | 0.00 | -0.07 | -0.02 | 0.02 | 0.08 | |||
| ABH | -0.02 | -0.02 | -0.02 | -0.02 | -0.11 | -0.11 | -0.11 | -0.11 | |||
| HT | 0.01 | 0.01 | 0.01 | 0.01 | -0.12 | -0.12 | -0.12 | -0.12 | |||
| (A) | (B) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | 20 | 30 | 40 | 50 | 20 | 30 | 40 | 50 | |||
| GREG | 92.8 | 91.2 | 90.6 | 89.5 | 92.9 | 91.8 | 91.5 | 89.6 | |||
| GREG-L | 93.0 | 92.9 | 92.9 | 93.0 | 94.3 | 94.3 | 94.5 | 94.0 | |||
| GREG-R | 93.2 | 91.9 | 91.0 | 91.0 | 93.3 | 92.6 | 91.7 | 91.9 | |||
| CP | AB | 94.4 | 94.6 | 94.6 | 95.0 | 95.2 | 95.5 | 95.8 | 96.0 | ||
| ABL | 94.5 | 94.4 | 95.3 | 95.3 | 95.2 | 95.8 | 96.7 | 97.4 | |||
| ABH | 94.9 | 94.6 | 94.7 | 95.9 | 96.0 | 96.4 | 96.7 | 97.3 | |||
| HT | 95.9 | 95.9 | 95.9 | 95.9 | 95.5 | 95.5 | 95.5 | 95.5 | |||
| GREG | 7.01 | 6.87 | 6.73 | 6.58 | 7.27 | 7.12 | 6.96 | 6.79 | |||
| GREG-L | 7.12 | 7.09 | 7.07 | 7.05 | 7.40 | 7.37 | 7.35 | 7.32 | |||
| GREG-R | 7.09 | 6.98 | 6.88 | 6.81 | 7.36 | 7.24 | 7.14 | 7.05 | |||
| AL | AB | 7.47 | 7.57 | 7.67 | 7.78 | 7.80 | 7.92 | 8.06 | 8.24 | ||
| ABL | 7.49 | 7.60 | 7.72 | 7.86 | 7.80 | 7.93 | 8.06 | 8.22 | |||
| ABH | 7.45 | 7.54 | 7.65 | 7.78 | 7.75 | 7.83 | 7.94 | 8.06 | |||
| HT | 9.60 | 9.60 | 9.60 | 9.60 | 9.53 | 9.53 | 9.53 | 9.53 | |||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models · Census and Population Estimation
An Approximate Bayesian Approach to Model-assisted Survey Estimation with Many Auxiliary Variables
Shonosuke Sugasawa Center for Spatial Information Science, The University of Tokyo
Jae Kwang Kim Department of Statistics, Iowa State University
Abstract
Model-assisted estimation with complex survey data is an important practical problem in survey sampling. When there are many auxiliary variables, selecting significant variables associated with the study variable would be necessary to achieve efficient estimation of population parameters of interest. In this paper, we formulate a regularized regression estimator in the framework of Bayesian inference using the penalty function as the shrinkage prior for model selection. The proposed Bayesian approach enables us to get not only efficient point estimates but also reasonable credible intervals. Results from two limited simulation studies are presented to facilitate comparison with existing frequentist methods.
Keywords: Generalized regression estimation; Regularization; Shrinkage prior; Survey Sampling
Introduction
Probability sampling is a scientific tool for obtaining a representative sample from the target population. In order to estimate a finite population total from a target population, Horvitz-Thompson (HT) estimator obtained from a probability sample satisfies design-consistency and the resulting inference is justified from the randomization perspective (Horvitz and Thompson, 1952). However, the HT estimator uses the first-order inclusion probability only and does not fully incorporate all available information in the finite population. To improve its efficiency, regression estimation is often used by incorporating auxiliary information in the finite population. Deville and Särndal (1992), Fuller (2002), Kim and Park (2010), and Breidt and Opsomer (2017) present comprehensive overviews of variants of regression estimation in survey sampling. There are also other directions of improvement on the HT estimator based on prediction using augmented models (e.g. Zeng and Little, 2003, 2005; Zanganeh and Little, 2015).
The regression estimation approaches in survey sampling assume a model for the finite population, i.e., the superpopulation model, as
[TABLE]
where is a response variable, {\mbox{\boldmathx}}_{i} and are vectors of auxiliary variables and regression coefficients, respectively, and is an error term satisfying and . The superpopulation model does not necessarily hold in the sample as the sampling design can be informative (e.g. Pfeffermann and Sverchkov, 1999; Little, 2004). Under the regression superpopulation model in (1), Isaki & Fuller (1982) show that the asymptotic variance of the regression estimator achieves the lower bound of Godambe and Joshi (1965). Thus, the regression estimator is asymptotically efficient in the sense of achieving the minimum anticipated variance under the joint distribution of the sampling design and the superpopulation model in (1).
On the other hand, the dimension of the auxiliary variables {\mbox{\boldmathx}}_{i} could be large in practice. Even when the number of observed covariates is not necessarily large, the dimension of {\mbox{\boldmathx}}_{i} could be very large once we include polynomial or interaction terms to achieve flexible modeling, as considered in Section 7. However, in this case, the optimality of the regression estimator is untenable. When there are many auxiliary variables, the asymptotic bias of the regression estimator using all the auxiliary variables is no longer negligible and the resulting inference can be problematic. Simply put, including irrelevant auxiliary variables can introduce substantial variability in point estimation, but its uncertainty is not fully accounted for by the standard linearization variance estimation, resulting in misleading inference.
To overcome the problem, variable selection techniques for regression estimation have been considered in literatures (e.g. Silva and Skinner, 1997; Särndal and Lundström, 2005). The classical selection approach is based on a step-wise method. However, the step-wise methods will not necessarily produce the best model (e.g. Dempster et al., 1977) although the potential effect on prediction could be limited. Another approach is to employ regularized estimation of regression coefficients. For example, McConville et al. (2017) propose a regularized regression estimation approach based on the LASSO penalty of Tibshirani (1996). However, there are two main problems with the regularization approach in regression estimation. First, the choice of the regularization parameter is not straightforward under survey sampling when the parameter is strongly related to the selection results. Second, after model selection, the frequentist inference is notoriously difficult to make.
In this paper, to overcome the above difficulties, we adopt a Bayesian framework in the regularized regression estimation. We first introduce an approximate Bayesian approach for regression estimation when p+1=\mbox{dim}({\mbox{\boldmathx}}) is fixed, using the approximate Bayesian approach considered in Wang et al. (2018). The proposed Bayesian method fully captures the uncertainty in parameter estimation for the regression estimator and has good coverage properties. Second, the proposed Bayesian method is extended to the problem of large in regularized regression estimation. By incorporating the penalty function for regularization into the prior distribution, the uncertainty associated with model selection and parameter estimation is fully captured in the Bayesian machinery. Furthermore, the choice of the penalty parameter can be handled by using its posterior distribution. Hence, the proposed method provides a unified approach to Bayesian inference with sparse model-assisted survey estimation. The proposed method is a calibrated Bayesian (Little, 2012) and it is asymptotically equivalent to the frequentist model-assisted approach for a fixed .
The paper is organized as follows. In Section 2, the basic setup is introduced. In Section 3, the approximate Bayesian inference using regression estimation is proposed under a fixed setup. In Section 4, the proposed method is extended to high dimensional setup by developing sparse regression estimation using shrinkage prior distributions. In Section 5, the proposed method is extended to non-linear regression models. In Section 6, results from two limited simulation studies are presented. The proposed method is applied to the real data example in Section 7. Some concluding remarks are made in Section 8.
Basic setup
Consider a finite population of a known size . Associated with unit in the finite population, we consider measurement \{{\mbox{\boldmathx}}_{i},y_{i}\} where {\mbox{\boldmathx}}_{i} is the vector of auxiliary variables with dimension and is the study variable of interest. We are interested in estimating the finite population mean from a sample selected by a probability sampling design. Let be the index set of the sample and we observe \{{\mbox{\boldmathx}}_{i},y_{i}\}_{i\in A} from the sample. The HT estimator , where is the first-order inclusion probability of unit , is design unbiased but it is not necessarily efficient.
If the finite population mean \bar{{\mbox{\boldmathX}}}=N^{-1}\sum_{i=1}^{N}{\mbox{\boldmathx}}_{i} is known, then we can improve the efficiency of by using the following regression estimator:
[TABLE]
where \hat{{\mbox{\boldmath\beta}}} is an estimator of in (1). Typically, we use \hat{{\mbox{\boldmath\beta}}} obtained by minimizing the weighted quadratic loss
[TABLE]
motivated from the model (1). If an intercept term is included in {\mbox{\boldmathx}}_{i} such that {\mbox{\boldmathx}}_{i}^{t}=(1,{\mbox{\boldmathx}}_{1i}^{t}), we can express
[TABLE]
where and \hat{{\mbox{\boldmath\beta}}}_{1} is given by
[TABLE]
where \hat{\bar{{\mbox{\boldmathX}}}}_{1,\pi}=\hat{N}^{-1}\sum_{i\in A}\pi_{i}^{-1}{\mbox{\boldmathx}}_{1i} and for some matrix .
To discuss some asymptotic properties of in (3), we consider a sequence of finite populations and samples as discussed in Isaki and Fuller (1982), where increases with . Note that
[TABLE]
where and
[TABLE]
for any {\mbox{\boldmath\beta}}_{1}. If we choose {\mbox{\boldmath\beta}}_{1}=p\lim_{n\to\infty}\hat{{\mbox{\boldmath\beta}}}_{1} with respect to the sampling probability and p=\mbox{dim}({\mbox{\boldmathx}}_{1}) is fixed in the asymptotic setup, then we can obtain and safely use the main terms of (5) to describe the asymptotic behavior of . To emphasize its dependence on \hat{{\mbox{\boldmath\beta}}}_{1} in the regression estimator, we can write \hat{\bar{Y}}_{\rm reg}=\hat{\bar{Y}}_{\rm reg}(\hat{{\mbox{\boldmath\beta}}}_{1}). Roughly speaking, we can obtain
[TABLE]
and, if then we can safely ignore the effect of estimating {\mbox{\boldmath\beta}}_{1} in the regression estimator. See Supplementary Material for a sketched proof of (6).
If, on the other hand, the dimension is larger than , then we cannot ignore the effect of estimating {\mbox{\boldmath\beta}}_{1}. In this case, we can consider using some variable selection idea to reduce the dimension of . For variable selection, we may employ techniques of regularized estimation of regression coefficients. The regularization method can be described as finding
[TABLE]
where Q({\mbox{\boldmath\beta}}) is defined in (2) and p_{\lambda}({\mbox{\boldmath\beta}}_{1}) is a penalty function with parameter . Some popular penalty functions are presented in Table 1. Once the solution to (7) is obtained, then the regularized regression estimator is given by
[TABLE]
Statistical inference with the regularized regression estimator in (8) is not fully investigated in the literature. For example, Chen et al. (2018) consider the regularized regression estimator using adaptive LASSO of Zou (2006), but they assume that the sampling design is non-informative and the uncertainty in model selection is not fully incorporated in their inference. Generally speaking, making inference after model selection under superpopulation frequentist framework is difficult. The approximated Bayesian method introduced in the next section will capture the full uncertainty in the Bayesian framework.
Approximate Bayesian survey regression estimation
Developing Bayesian model-assisted inference under complex sampling is a challenging problem in statistics. Wang et al. (2018) recently propose the so-called approximate Bayesian method for design-based inference using asymptotic normality of a design-consistent estimator. Specifically, for a given parameter with a prior distribution , if one can find a design-consistent estimator of , then the approximate posterior distribution of is given by
[TABLE]
where is the sampling distribution of , which is often approximated by a normal distribution.
Drawing on this idea, one can develop an approximate Bayesian approach to capture the full uncertainty in the regression estimator. Let
[TABLE]
be the design-consistent estimator of and \hat{{\mbox{\boldmathV}}}_{\beta} be the corresponding asymptotic variance-covariance matrix of \hat{{\mbox{\boldmath\beta}}}, given by
[TABLE]
where \hat{e}_{i}=y_{i}-{\mbox{\boldmathx}}_{i}^{t}\hat{{\mbox{\boldmath\beta}}}, and is the joint inclusion probability of unit and . Under some regularity conditions, as discussed in Chapter 2 of Fuller (2009), we can establish
[TABLE]
as , where \hat{{\mbox{\boldmathV}}}_{\beta 11} is the submatrix of \hat{{\mbox{\boldmathV}}}_{\beta} with
[TABLE]
Thus, using (9) and (11), we can obtain the approximate posterior distribution of as
[TABLE]
where denotes a -dimensional multivariate normal density and \pi({\mbox{\boldmath\beta}}_{1}) is a prior distribution for {\mbox{\boldmath\beta}}_{1}.
Now, we consider the conditional posterior distribution of for a given {\mbox{\boldmath\beta}}_{1}. First, define
[TABLE]
Note that \hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1}) is an approximately design-unbiased estimator of , regardless of {\mbox{\boldmath\beta}}_{1}. Under some regularity conditions, we can show that \hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1}) follows a normal distribution asymptotically. Thus, we obtain
[TABLE]
where
[TABLE]
is a design consistent variance estimator of \hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1}) for given {\mbox{\boldmath\beta}}_{1}. We then use \phi(\hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1});\bar{Y},\hat{V}_{e}({\mbox{\boldmath\beta}}_{1})) as the density for the approximate sampling distribution of \hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1}) in (14), where is the normal density function with mean and variance . Thus, the approximate conditional posterior distribution of given can be defined as
[TABLE]
where \pi(\bar{Y}\mid{\mbox{\boldmath\beta}}_{1}) is a conditional prior distribution of given {\mbox{\boldmath\beta}}_{1}. Without extra assumptions, we can use a flat prior distribution for \pi(\bar{Y}\mid{\mbox{\boldmath\beta}}_{1}).
Therefore, combining (13) and (16), the approximate posterior distribution of can be obtained as
[TABLE]
Generating posterior samples from (17) can be easily carried out via the following two steps:
Generate posterior sample {\mbox{\boldmath\beta}}_{1}^{\ast} of {\mbox{\boldmath\beta}}_{1} from (13).
- 2.
Generate posterior sample of from the conditional posterior (16) given {\mbox{\boldmath\beta}}_{1}^{\ast}.
Based on the approximate posterior samples of , we can compute the posterior mean as a point estimator as well as credible intervals for uncertainty quantification for including the variability in estimating {\mbox{\boldmath\beta}}_{1}.
The following theorem presents an asymptotic property of the proposed approximate Bayesian method.
Theorem 1**.**
Under the regularity conditions described in the Supplementary Material, conditional on the full sample data,
[TABLE]
in probability as and , where is some Borel set for and p(\bar{Y}|\hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1}),\hat{{\mbox{\boldmath\beta}}}_{1}) is given in (17).
Theorem 1 is a special case of the Bernstein-von Mises theorem (van der Vaart, 2000, Section 10.2) in survey regression estimation, and its sketched proof is given in the Supplementary Material. The proof is not necessarily rigorous but contains enough details to deliver the main ideas. According to Theorem 1, the credible interval for constructed from the approximated posterior distribution (17) is asymptotically equivalent to the frequentist confidence interval based on the asymptotic normality of the common survey regression estimator. Therefore, the proposed Bayesian method implements the frequentist inference of the survey regression estimator at least asymptotically.
Approximate Bayesian method with shrinkage priors
We now consider the case when there are many auxiliary variables in applying regression estimation. When is large, it is desirable to select a suitable subset of auxiliary variables that are associated with the response variable to avoid inefficient regression estimation due to irrelevant covariates.
To deal with the problem in a Bayesian way, we may define the approximate posterior distribution of given {\mbox{\boldmath\beta}}_{1} as similar to (17). That is, we use the asymptotic distribution of the estimators \hat{{\mbox{\boldmath\beta}}}_{1} of {\mbox{\boldmath\beta}}_{1} and assign a shrinkage prior for {\mbox{\boldmath\beta}}_{1}. Let \pi_{\lambda}({\mbox{\boldmath\beta}}_{1}) be the shrinkage prior for {\mbox{\boldmath\beta}}_{1} with a structural parameter which might be multivariate.
Among the several choices of shrinkage priors, we specifically consider two priors for {\mbox{\boldmath\beta}}_{1}: Laplace (Park and Casella, 2008) and horseshoe (Carvalho et al., 2009, 2010). The Laplace prior is given by \pi_{\lambda}({\mbox{\boldmath\beta}}_{1})\propto\exp(-\lambda\sum_{k=1}^{p}|\beta_{k}|), which is related to Lasso regression (Tibshirani, 1996), so that the proposed approximated Bayesian method can be seen as the Bayesian version of a survey regression estimator with Lasso (McConville et al., 2017). The horseshoe prior is a more advanced shrinkage prior of the form:
[TABLE]
where denotes the normal density function with mean and variance . It is known that the horseshoe prior enjoys more severe shrinkage for the zero elements of {\mbox{\boldmath\beta}}_{1} than the Laplace prior, thus allowing strong signals to remain large (Carvalho et al., 2009).
Similarly to (13), we can develop a posterior distribution of {\mbox{\boldmath\beta}}_{1} using the shrinkage prior
[TABLE]
where is the asymptotic variance-covariance matrix of \hat{{\mbox{\boldmath\beta}}}_{1}, defined in (12). Once {\mbox{\boldmath\beta}}_{1} are sampled from (20), we can use the same posterior distribution of in (16) for a given .
Therefore, the approximate posterior distribution of can be obtained as
[TABLE]
Generating posterior samples from (21) can be easily carried out via the following two steps:
For a given , generate posterior sample {\mbox{\boldmath\beta}}_{1}^{\ast} of {\mbox{\boldmath\beta}}_{1} from p_{\lambda}(\bar{Y}|\hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1}),\hat{{\mbox{\boldmath\beta}}}_{1}) in (20).
- 2.
Generate posterior sample of from the conditional posterior (16) for given {\mbox{\boldmath\beta}}_{1}^{\ast}.
Remark 1**.**
Let and \hat{{\mbox{\boldmath\beta}}}_{1}^{(R)} be the estimator of and {\mbox{\boldmath\beta}}_{1} defined as
[TABLE]
where {\rm P}({\mbox{\boldmath\beta}}_{1})=-2\log\pi_{\lambda}({\mbox{\boldmath\beta}}_{1}) is the penalty (regularization) term for {\mbox{\boldmath\beta}}_{1} induced from prior \pi_{\lambda}({\mbox{\boldmath\beta}}_{1}). For example, the Laplace prior for \pi_{\lambda}({\mbox{\boldmath\beta}}_{1}) leads to the penalty term {\rm P}({\mbox{\boldmath\beta}}_{1})=2\lambda\sum_{k=1}^{p}|\beta_{k}|, in which \hat{{\mbox{\boldmath\beta}}}_{1}^{(R)} corresponds to the regularized estimator of {\mbox{\boldmath\beta}}_{1} used in McConville et al. (2017). Since the exponential of -\sum_{i\in A}\pi_{i}^{-1}(y_{i}-\beta_{0}-{\mbox{\boldmathx}}_{i}^{t}{\mbox{\boldmath\beta}}_{1})^{2} is close to the approximated likelihood \phi_{p}((\hat{\beta}_{0},\hat{{\mbox{\boldmath\beta}}}_{1}^{t});(\beta_{0},{\mbox{\boldmath\beta}}_{1}^{t}),\hat{{\mbox{\boldmathV}}}_{\beta}) used in the approximated Bayesian method when is large, the mode of the approximated posterior of (\beta_{0},{\mbox{\boldmath\beta}}_{1}^{t}) would be close to the frequentist estimator (22) as well.
Remark 2**.**
By the frequentist approach, is often called the tuning parameter and can be selected via a data-dependent procedure such as cross validation as used in McConville et al. (2017). On the other hand, in the Bayesian approach, we assign a prior distribution on the hyperparameter and consider integration with respect to the posterior distribution of , which means that uncertainty of the hyperparameter estimation can be taken into account. Specifically, we assign a gamma prior for as the Laplace prior and a half-Cauchy prior for as the horseshoe prior (19). They both lead to familiar forms of full conditional posterior distributions of or . The details are given in the Supplementary Material.
As in Section 3, we obtain the following asymptotic properties of the proposed approximate Bayesian method.
Theorem 2**.**
Under the regularity conditions described in the Supplementary Material, conditional on the full sample data,
[TABLE]
in probability as and , where is some Borel set for and p_{\lambda}(\bar{Y}|\hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1}),\hat{{\mbox{\boldmath\beta}}}_{1}) is given in (21).
The sketched proof is given in the Supplementary Material. Theorem 2 ensures that the proposed approximate Bayesian method is asymptotically equivalent to the frequentist version in which {\mbox{\boldmath\beta}}_{1} is estimated by the regularized method with penalty corresponding to the shrinkage prior used in the Bayesian method. Moreover, the proposed Bayesian method can be extended to cases using general non-linear regression, as demonstrated in the next section.
An Extension to non-linear models
The proposed Bayesian methods can be readily extended to work with non-linear regression. Some extensions of the regression estimator to nonlinear models are also considered in Wu and Sitter (2001), Breidt et al. (2005), and Montanari and Ranalli (2005).
We consider a general working model for as {\rm E}(y_{i}\mid{\mbox{\boldmathx}}_{i})=m({\mbox{\boldmathx}}_{i};{\mbox{\boldmath\beta}})=m_{i} and {\rm Var}(y_{i}\mid{\mbox{\boldmathx}}_{i})=\sigma^{2}a(m_{i}) for some known functions and . The model-assisted regression estimator for with known is then
[TABLE]
and its design-consistent variance estimator is obtained by
[TABLE]
which gives the approximate conditional posterior distribution of given . That is, similarly to (16), we can obtain
[TABLE]
To generate the posterior values of , we first find a design-consistent estimator \hat{{\mbox{\boldmath\beta}}} of . Note that a consistent estimator \hat{{\mbox{\boldmath\beta}}} can be obtained by solving
[TABLE]
where h({\mbox{\boldmathx}}_{i};{\mbox{\boldmath\beta}})=(\partial m_{i}/\partial{\mbox{\boldmath\beta}})/a(m_{i}). For example, for binary , we may use a logistic regression model with m({\mbox{\boldmathx}}_{i};{\mbox{\boldmath\beta}})=\exp({\mbox{\boldmathx}}_{i}^{t}{\mbox{\boldmath\beta}})/\{1+\exp({\mbox{\boldmathx}}_{i}^{t}{\mbox{\boldmath\beta}})\} and , which leads to h({\mbox{\boldmathx}}_{i};{\mbox{\boldmath\beta}})={\mbox{\boldmathx}}_{i}.
Under some regularity conditions, we can establish the asymptotic normality of \hat{{\mbox{\boldmath\beta}}}. That is,
[TABLE]
where
[TABLE]
with \hat{e}_{i}=y_{i}-m({\mbox{\boldmathx}}_{i};\hat{{\mbox{\boldmath\beta}}}), \hat{{\mbox{\boldmathh}}}_{i}=h({\mbox{\boldmathx}}_{i};\hat{{\mbox{\boldmath\beta}}}), and \dot{m}({\mbox{\boldmathx}};{\mbox{\boldmath\beta}})=\partial m({\mbox{\boldmathx}};{\mbox{\boldmath\beta}})/\partial{\mbox{\boldmath\beta}}. Note that \dot{m}({\mbox{\boldmathx}};{\mbox{\boldmath\beta}})=m_{i}(1-m_{i}){\mbox{\boldmathx}}_{i} under a logistic regression model.
Thus, the posterior distribution of given \hat{{\mbox{\boldmath\beta}}} can be obtained by
[TABLE]
We can use a shrinkage prior \pi({\mbox{\boldmath\beta}}) for in (25) if necessary. Once {\mbox{\boldmath\beta}}^{*} is generated from (25), the posterior values of are generated from (24) for a given {\mbox{\boldmath\beta}}^{*}.
This formula enables us to define the approximate posterior distribution of of the form (13), so that the approximate Bayesian inference for can be carried out in the same way as in the linear regression case. Note that Theorem 1 still holds under the general setup as long as the regularity conditions given in the Supplementary Material are satisfied.
Simulation
We investigate the performance of the proposed approximate Bayesian methods against standard frequentist methods using two limited simulation studies. In the first simulation, we consider a linear regression model for a continuous variable. In the second simulation, we consider a binary and apply the logistic regression model for the non-linear regression estimation.
In the first simulation, we generate , , from a multivariate normal distribution with mean vector and variance-covariance matrix , where and the -th element of is . The response variables are generated from the following linear regression model:
[TABLE]
where , , , , , and the other ’s are set to zero. For the dimension of the auxiliary information, we consider four scenarios for of and . For each , we assume that we can access only a subset of the full information . Note that for all scenarios the auxiliary variables significantly related with are included, and so only the amount of irrelevant information gets larger as gets larger. We selected a sample size of from the finite population, using two sampling mechanism: (A) simple random sampling (SRS) and (B) probability-proportional-to-size sampling (PPS) with size measure with . The parameter of interest is . We assume that is known for all .
For the simulated dataset, we apply the proposed approximate Bayesian methods with the uniform prior \pi({\mbox{\boldmath\beta}}_{1})\propto 1, Laplace prior and horseshoe prior (19) for {\mbox{\boldmath\beta}}_{1}, which are denoted by AB, ABL and ABH, respectively. For all the Bayesian methods, we use . We generate 5,000 posterior samples of after discarding the first 500 samples and compute the posterior mean of as the point estimate. As for the frequentist methods, we apply the original generalized regression estimator without variable selection (GREG) as well as the GREG method with Lasso regularization (GREG-L; McConville et al., 2017), ridge estimation of {\mbox{\boldmath\beta}}_{1} (GREG-R; Rao and Singh, 1997) and forward variable selection (GREG-V) using adjusted coefficient of determination. We also adopted the mixed modeling approach to the GREG estimation (GREG-M; Park and Fuller, 2009) which is similar to GREG-R. Moreover, the HT estimator is employed as a benchmark for efficiency comparison. In GREG-L, the tuning parameter is selected via 10-fold cross validation, and we use the gamma prior for in ABL, where is the selected value for in GREG-L. In ABH, we assign a prior for the tuning parameter and generate posterior samples. Based on replications, we calculate the square root of mean squared errors (RMSE) and bias of point estimators which are reported in Table 6. We also evaluated the performance of confidence (credible) intervals using coverage probabilities (CP) and the average length (AL), which are shown in Table 7.
Table 6 shows that RMSE and bias of AB and GREG are almost identical, which is consistent with the fact that AB is a Bayesian version of GREG. Moreover, the results show that the existing shrinkage methods such as GREG-L and the proposed Bayesian methods ABL and ABH tend to produce smaller RMSEs and smaller absolute biases than GREG or AB as increases, which indicates the importance of suitable selection of auxiliary variables when is large. From Table 7, it is observed that the CPs of GREG decreases as increases and are significantly smaller than the nominal level since GREG ignores the variability in estimating and the variability increases as increases. On the other hand, the Bayesian version AB can take account of the variability estimating and the CPs are around the nominal level and ALs of AB are larger than those of GREG. Although the performance of GREG-L is much better than GREG due to the shrinkage techniques, the CPs are not necessarily close to the nominal level. Note that GREG-M takes account of the variability estimating , but not in other parameters, thereby the coverage performance is limited. It is also confirmed that the proposed ABH and ABL methods produce narrower intervals than AB.
In the second simulation study, we consider the binary case for and apply the non-linear regression method discussed in Section 5. The binary response variables are generated from the following logistic regression model:
[TABLE]
where and the other settings are the same as the linear regression case. We selected a sample size of from the finite population, using two sampling mechanism: (A) simple random sampling and (B) probability-proportional-to-size sampling with size measure with . We again apply the three Bayesian methods and three frequents methods, GREG, GREG-L and GREG-R, based on a logistic regression model to obtain point estimates and confidence/credible intervals of the population mean . The obtained RMSE and bias of point estimates and CP and AL of intervals based on 1,000 replications are reported in Tables 8 and 9, respectively, which also shows again the superiority of the proposed Bayesian approach to the frequentist approach in terms of uncertainty quantification.
Example
We applied the proposed methods to the synthetic income data available from the sae package (Molina and Marhuenda, 2015) in R. In the dataset, the normalized annual net income is observed for a certain number of individuals in each province of Spain. The dataset contains 9 covariates; four indicators of the four groupings of ages (, , and denoted by ag1ag4, respectively), the indicator of having Spanish nationality na, the indicators of education levels (primary education ed1 and post-secondary education ed2), and the indicators of two employment categories (employed em1 and unemployed em2). We also adopted 13 interaction variables: ag1na, ag2na, ag3na, ag4na, ag2ed1, ag3ed1, ag4ed1, ag1em1, ag2em1, ag3em1, ag4em1, ed1em1 and ed2*em1, as auxiliary variables, thereby in this example. The dataset also contains information of survey weights, so that we used its inverse value as the sampling probability. Since there is no information regarding the details of sampling mechanism, we approximate the joint inclusion probability as the product of two sampling probabilities. In this example, we focus on estimating average income in three provinces, Palencia, Segovia and Soria, where the number of sampled units are 72, 58 and 20, respectively. The number of non-sampled units were around . It should be noted that the number of sample sizes are not so large compared with the number of auxiliary variables, especially in Soria. Hence, the estimation error of regression coefficients would not be negligible and the proposed Bayesian methods would be appealing in this case.
In order to perform joint estimation and inference in the three provinces, we employed the following working model:
[TABLE]
where is an intercept term, if belong to province , where for Palencia, for Segovia, and for Soria, and {\mbox{\boldmathx}}_{i} is the vector of auxiliary variables with dimension (9 auxiliary variables and 13 interaction variables). Here is the log-transformed net income and is the error term.
Under the working model (26), the posterior distribution of is
[TABLE]
where
[TABLE]
and
[TABLE]
Based on the above formulas, we performed the proposed approximate Bayesian methods for for each , and computed credible intervals for the log-transformed average income with 5000 posterior samples after discarding the first 500 samples as burn-in period. We considered three types of priors for {\mbox{\boldmath\beta}}_{1}, flat, Laplace and horseshoe priors as considered in Section 6. We also calculated confidence intervals of the log-transformed average income based on the two frequentist methods, GREG and GREG-L, using the working model (26). In applying GREG-L, the tuning parameter in the Lasso estimator was selected via 10 fold cross validation.
The credible intervals of {\mbox{\boldmath\beta}}_{1} based on the approximate posterior distributions under Laplace and horseshoe priors are shown in Figure 1, in which the design-consistent and Lasso estimates of {\mbox{\boldmath\beta}}_{1} are also given. It is observed that the approximate posterior mean of {\mbox{\boldmath\beta}}_{1} shrinks the design-consistent estimates of {\mbox{\boldmath\beta}}_{1} toward [math] although exactly zero estimates are not produced as the frequentist Lasso estimator does. The Lasso estimate selects only one variable among 22 candidates, and the variable is also significant in terms of the credible interval in both two priors. Moreover, the two Bayesian methods detect one or two more variables to be significant judging from the credible intervals. Comparing the results from two priors, the horseshoe prior provides narrower credible intervals than the Laplace prior.
In Figure 2, we show the resulting credible and confidence intervals of the average income in the three provinces. It is observed that the proposed Bayesian methods, AB and ABL, tend to produce wider credible intervals than the confidence intervals of the corresponding frequencies methods, GREG and GREG-L, respectively, which is consistent with the simulation results in Section 6. We can also confirm that the credible intervals of ABH are slightly narrower than those of ABL, which would reflect the differences of interval lengths of {\mbox{\boldmath\beta}}_{1} as shown in Figure 1.
Concluding Remarks
We have proposed an approximate Bayesian method for model-assisted survey estimation using parametric regression models as working models. The proposed method is justified under the frequentist framework and captures the full uncertainty in estimating regression parameters even when the number of the auxiliary variables is large. A main advantage of the proposed method is that it uses a shrinkage prior for regularized regression estimation, which not only provides an efficient point estimator, but also fully captures the uncertainty associated with model selection and parameter estimation via a Bayesian framework. Although we only consider two popular prior distributions, the Laplace prior and the horseshoe prior, other priors, such as the spike-and-slab prior (Ishwaran and Rao, 2005), can be adopted in the same way. Further investigation regarding the choice of the shrinkage prior distributions will be an important research topic in the future.
Although our working model is parametric, the proposed approximate Bayesian method can be applied to other semiparametric models such as local polynomial model (Breidt and Opsomer, 2000), P-spline regression model (Breidt et al., 2005), or a neural network model (Montanari and Ranalli, 2005). By finding suitable prior distributions for the semiparametric models, the model complexity parameters will be determined automatically and the uncertainty will be captured in the approximate Bayesian framework.
Finally, under more complicated sampling design such as multi-stage stratified cluster sampling, the main idea can be applied similarly since the proposed Bayesian method relies on the sampling distribution of the GREG estimator, which is asymptotically normal as shown by Krewski and Rao (1981). If the asymptotic normality is questionable, one can use a weighted likelihood bootstrap to approximate Bayesian posterior, as in Lyddon et al. (2019). Such extensions are beyond the scope of this paper and will be considered in the future.
Supplementary Materials
Supplementary Material includes technical details for posterior computation, proofs of theorems and additional results of simulation studies.
Acknowledgement
We thank the AE and three anonymous referees for very constructive comments. The first author was supported by Japan Society for the Promotion of Science KAKENHI grant number JP18K12757. The second author was supported by US National Science Foundation (MMS-1733572).
Proof of (2.5)
We assume the same conditions in the proof of Theorem 1, given in Section S3. From (2.4), we have
[TABLE]
where the expectation is taken with respect to the sampling distribution. Also, we can show that . Therefore, using Chebychev inequality, we have and result (2.5) follows.
Posterior computation
We provide the algorithm for generating the approximate posterior distribution of {\mbox{\boldmath\beta}}_{1} given in (4.20) with two shrinkage priors, Laplace and horseshoe (4.18) priors. Using the mixture representation of both priors, we get the following Gibbs sampling algorithm.
Laplace prior
We consider the mixture representation of Laplace distribution: and , independently, for . For , we consider the conjugate prior , where is a gamma distribution with shape parameter and rate parameter . The full conditional distribution of {\mbox{\boldmath\beta}}_{1} is multivariate normal with mean {\mbox{\boldmathA}}^{-1}\hat{{\mbox{\boldmathV}}}_{\beta 11}^{-1}\hat{{\mbox{\boldmath\beta}}}_{1} and variance-covariance matrix {\mbox{\boldmathA}}^{-1} where {\mbox{\boldmathA}}=\hat{{\mbox{\boldmathV}}}_{\beta 11}^{-1}+\mathbf{D}^{-1} with . The full conditional distribution of is , and are conditionally independent, with conditionally inverse-Gaussian with parameters in the parametrization of the inverse-Gaussian density given by
[TABLE]
Horseshoe prior
The prior for {\mbox{\boldmath\beta}}_{1} can be expressed as a hierarchy: and independently for , where is the standard half-Cauchy distribution. Using the hierarchical expression of the half-Cauchy distribution, we obtain the following Gibbs sampling steps. Let {\mbox{\boldmathA}}=\hat{{\mbox{\boldmathV}}}_{\beta 11}^{-1}+\mathbf{B}^{-1}, where . The full conditional distribution of {\mbox{\boldmath\beta}}_{1} is multivariate normal with mean {\mbox{\boldmathA}}^{-1}\hat{{\mbox{\boldmathV}}}_{\beta 11}^{-1}\hat{{\mbox{\boldmath\beta}}}_{1} and variance-covariance matrix {\mbox{\boldmathA}}^{-1}. The full conditional distribution of and are, respectively, give by
[TABLE]
where denotes an inverse-Gamma distribution with shape parameter and rate parameter . Here and are additional latent variables, and their full conditional distributions are given by and , respectively.
A sketched proof of Theorem 1
To discuss the asymptotic properties of the approximate Bayesian method, we first assume a sequence of finite populations and samples with finite fourth moments as in Isaki & Fuller (1982). The finite population is a random sample from an unknown superpopulation model. Let and {\mbox{\boldmath\beta}}_{1\ast} be the true values of and {\mbox{\boldmath\beta}}_{1}. Let and be a ball with centre {\mbox{\boldmath\beta}}_{1\ast} and radius for . We make the following regularity assumptions
- (C1)
Assume that the sufficient conditions for the asymptotic normality of for hold for the sequence of finite populations and samples.
- (C2)
Assume that the prior distribution is positive and satisfies a Lipschitz condition over its support ; that is, there exists such that for .
- (C3)
Assume that \hat{{\mbox{\boldmathV}}}_{\beta 11}={\mbox{\boldmathV}}_{\beta 11}\{1+o_{P}(1)\} and (\hat{{\mbox{\boldmath\beta}}}_{1}-{\mbox{\boldmath\beta}}_{1})^{t}\hat{{\mbox{\boldmathV}}}_{\beta 11}^{-1}(\hat{{\mbox{\boldmath\beta}}}_{1}-{\mbox{\boldmath\beta}}_{1})=(\hat{{\mbox{\boldmath\beta}}}_{1}-{\mbox{\boldmath\beta}}_{1})^{t}{\mbox{\boldmathV}}_{\beta 11}^{-1}(\hat{{\mbox{\boldmath\beta}}}_{1}-{\mbox{\boldmath\beta}}_{1})\{1+o_{P}(1)\} for any {\mbox{\boldmath\beta}}\in C_{n} and .
- (C4)
Assume that \pi({\mbox{\boldmath\beta}}) is positive and finite over its support .
Sufficient conditions for (C1) are discussed within various asymptotic structures (e.g. Binder, 1983; Pfeffermann & Sverchkov, 2009). Conditions (C2) and (C4) are satisfied for common priors such as (multivariate) normal distribution . Condition (C3) essentially requires that the design variance estimators be consistent and meet a certain continuity condition.
Proof.
Let g(\bar{Y},{\mbox{\boldmath\beta}})=\phi(\hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1});\bar{Y},\hat{V}_{e}({\mbox{\boldmath\beta}}))\phi_{p}(\hat{{\mbox{\boldmath\beta}}}_{1};{\mbox{\boldmath\beta}}_{1},\hat{{\mbox{\boldmathV}}}_{\beta 11})\pi({\mbox{\boldmath\beta}}_{1}). Then, the approximated posterior distribution is given by
[TABLE]
Note that
[TABLE]
By the same argument in the proof of Theorem 1 in Wang et al. (2018), we have
[TABLE]
so the second term in (S1) is . On the other hand, under condition (C3), \phi_{p}(\hat{{\mbox{\boldmath\beta}}}_{1};{\mbox{\boldmath\beta}}_{1},\hat{{\mbox{\boldmathV}}}_{\beta 11})=\phi_{p}(\hat{{\mbox{\boldmath\beta}}}_{1};{\mbox{\boldmath\beta}}_{1},{\mbox{\boldmathV}}_{\beta 11})\{1+o_{P}(1)\} as , for any {\mbox{\boldmath\beta}}_{1}\in C_{n}, thereby under condition (C4),
[TABLE]
as since and \hat{{\mbox{\boldmath\beta}}}_{1}\to{\mbox{\boldmath\beta}}_{1\ast} as . Hence, we have
[TABLE]
for any as , where (S2) follows from (C2), and (S3) follows from the properties \hat{V}_{e}(\hat{{\mbox{\boldmath\beta}}}_{1})=\hat{V}_{e}({\mbox{\boldmath\beta}}_{1\ast})\{1+o_{P}(1)\} and \hat{\bar{Y}}_{\rm reg}(\hat{{\mbox{\boldmath\beta}}}_{1})=\hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1\ast})\{1+o_{P}(1)\} under (C1). Let R_{n}=\{\bar{Y}\in\Theta_{Y}:\hat{V}_{e}(\hat{{\mbox{\boldmath\beta}}}_{1})^{-1}(\hat{\bar{Y}}_{\rm reg}(\hat{{\mbox{\boldmath\beta}}}_{1})-\bar{Y})^{2}\leq\chi^{2}_{1}(q)\}, where is the upper -quantile of the chi-squared distribution with degree of freedom. Then, . Since \hat{\bar{Y}}_{\rm reg}(\hat{{\mbox{\boldmath\beta}}}_{1})-\bar{Y}_{\ast}=O_{p}(n^{-1/2}) and , which is slower than , it holds that . Then,
[TABLE]
which means that
[TABLE]
for any , implying
[TABLE]
Then,
[TABLE]
which are both from (S3) and (S4). This completes the proof. ∎
A sketched proof of Theorem 2
The condition (C4) given in the proof of Theorem 1 may not be satisfied for shrinkage priors. For example, the horseshoe prior diverge at the origin . In what follows, let {\mbox{\boldmath\beta}}=(\beta_{0},{\mbox{\boldmath\beta}}_{1}^{t}) and define \hat{{\mbox{\boldmath\beta}}}_{1} and \hat{{\mbox{\boldmath\beta}}}_{1}^{(R)} in the same way. We use the following alternative condition for the shrinkage prior \pi_{\lambda}({\mbox{\boldmath\beta}}_{1}):
- (C5)
The regularized estimator \hat{{\mbox{\boldmath\beta}}}_{1}^{(R)} under penalty -\log\pi_{\lambda}({\mbox{\boldmath\beta}}_{1}) is asymptotically normal, that is, \sqrt{n}(\hat{{\mbox{\boldmath\beta}}}_{1}^{(R)}-{\mbox{\boldmath\beta}}_{1\ast})\to N(0,\mathbf{C}), where is a positive definite matrix and is appropriately chosen.
Under the Laplace prior, \hat{{\mbox{\boldmath\beta}}}_{1}^{(R)} is equivalent to the Lasso estimator, and the above property holds if (Knight & Fu, 2000; McConville et al., 2017). For general prior \pi_{\lambda}({\mbox{\boldmath\beta}}_{1}), this condition holds if the assumption regarding the penalty term P_{\lambda}({\mbox{\boldmath\beta}}_{1}) given in Fan & Li (2001) is satisfied.
Proof.
It is noted that
[TABLE]
Define
[TABLE]
Then, it holds that
[TABLE]
as , where is a ball with center {\mbox{\boldmath\beta}}_{1\ast} and radius for . Hence, the statement can be proved in the same way as the proof of Theorem 1 since \phi(\hat{\bar{Y}}_{\rm reg}({\mbox{\boldmath\beta}}_{1\ast});\bar{Y},\hat{V}_{e}({\mbox{\boldmath\beta}}_{1\ast}))=\phi(\hat{\bar{Y}}_{\rm reg}(\hat{{\mbox{\boldmath\beta}}}_{1}^{(R)});\bar{Y},\hat{V}_{e}(\hat{{\mbox{\boldmath\beta}}}_{1}^{(R)}))\{1+o_{P}(1)\}. ∎
Additional simulation results
We here provide additional simulation results. We considered the same scenarios in the main document with . The results are reported in Table S14.
References
- Binder (1983)
Binder, D. A. (1983).
On the variances of asymptotically normal estimators from complex surveys.
Int. Statist. Rev. 51, 279–292.
- Fan & Li (2001)
Fan, J. & R. Li (2001).
Variable selection via nonconcave penalized likelihood and its Oracle properties.
J. Am. Statist. Assoc. 96, 1348–1360.
- Isaki & Fuller (1982)
Isaki, C. T. & W. A. Fuller (1982).
Survey design under the regression superpopulation model.
J. Am. Statist. Assoc. 77, 89–96.
- Knight & Fu (2000)
Knight, K. & W. Fu (2000).
Asymptotics for Lasso-type estimators.
Ann. Statist. 28, 1356–1378.
- McConville et al. (2017)
McConville, K., F. Breidt, T. Lee, & G. Moisen (2017).
Model-assisted survey regression estimation with the LASSO.
J. Survey Statist. Methodol. 5, 131–158.
- Pfeffermann & Sverchkov (2009)
Pfeffermann, D. & M. Sverchkov (2009).
Inference under informative sampling.
Handbook of Statistics, Amsterdam: Elsevier.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Binder (1983) Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Reviews 51 , 279–292.
- 2Breidt et al. (2005) Breidt, F. J., G. Claeskens, and J. D. Opsomer (2005). Model-assisted estimation for complex surveys using penalised splines. Biometrika 92 , 831–846.
- 3Breidt and Opsomer (2000) Breidt, F. J. and J. D. Opsomer (2000). Local polynomial regression estimators in survey sampling. Annals of Statistics 28 , 403–427.
- 4Breidt and Opsomer (2017) Breidt, F. J. and J. D. Opsomer (2017). Model-assisted survey estimation with modern prediction techniques. Statistical Science 32 , 190–205.
- 5Carvalho et al. (2009) Carvalho, C. M., N. G. Polson, and J. G. Scott (2009). Handling sparsity via the horseshoe. Proceedings of the 12th International Confe- rence on Artificial Intelligence and Statistics (AISTATS 2009) .
- 6Carvalho et al. (2010) Carvalho, C. M., N. G. Polson, and J. G. Scott (2010). The horseshoe estimator for sparse signals. Biometrika 97 , 465–480.
- 7Chen et al. (2018) Chen, J. K. T., R. L. Valliant, and M. R. Elliott (2018). Model-assisted calibration of non-probability sample survey data using adaptive LASSO. Survey Methodology 44 , 117–144.
- 8Dempster et al. (1977) Dempster, A. P., M. Schatzoff, and N. Wermuth (1977). A simulation study of alternatives to ordinary least squares. Journal of the American Statistical Association 72 , 77–91.
