On the estimation of population size from a post-stratified two sample capture-recapture data under dependence
Kiranmoy Chatterjee, Prajamitra Bhuyan

TL;DR
This paper introduces a new model for population size estimation using two sample capture-recapture data that accounts for dependency between capture and recapture, improving accuracy over traditional methods.
Contribution
The paper proposes a novel model that incorporates dependency in capture-recapture data and develops estimation methods for this model, addressing a gap in existing literature.
Findings
Proposed model outperforms existing methods in simulations.
Method effectively captures dependency between capture and recapture.
Illustrated with real data analysis.
Abstract
Population size estimation based on two sample capture-recapture type experiment is an interesting problem in various fields including epidemiology, pubic health, population studies, etc. The Lincoln-Petersen estimate is popularly used under the assumption that capture and recapture status of each individual is independent. However, in many real life scenarios, there is an inherent dependency between capture and recapture attempts which is not well-studied in the literature of the dual system or two sample capture-recapture method. In this article, we propose a novel model that successfully incorporates the possible causal dependency and provide corresponding estimation methodologies for the associated model parameters based on post-stratified two sample capture-recapture data. The superiority of the performance of the proposed model over the existing competitors is established through…
| List 2 | |||
|---|---|---|---|
| List 1 | In | out | Total |
| In | |||
| Out | |||
| Total | |||
| Population | Method | RB | RRMSE | CP() | LCI | |
|---|---|---|---|---|---|---|
| P1 | 0.4 | MLE | 0.0370 | 0.0632 | 90 | 84.47 |
| MME | 0.0684 | 0.0843 | 99 | 107.75 | ||
| Nour | 0.1669 | 0.1703 | - | - | ||
| 0.8 | MLE | 0.0361 | 0.0628 | 99.5 | 98.84 | |
| MME | 0.0650 | 0.0813 | 99 | 99.55 | ||
| Nour | 0.3377 | 0.3389 | - | - | ||
| P2 | 0.4 | MLE | 0.0420 | 0.0647 | 88.5 | 95.53 |
| MME | 0.0747 | 0.0940 | 96.5 | 111.58 | ||
| Nour | 0.1628 | 0.1660 | - | - | ||
| 0.8 | MLE | 0.0413 | 0.0660 | 100 | 99.34 | |
| MME | 0.0719 | 0.0898 | 98.5 | 102.39 | ||
| Nour | 0.3346 | 0.3358 | - | - | ||
| P3 | 0.4 | MLE | 0.0401 | 0.0608 | 85 | 45.62 |
| MME | 0.0497 | 0.0634 | 94.5 | 76.87 | ||
| Nour | 0.0847 | 0.0886 | - | - | ||
| 0.8 | MLE | 0.0367 | 0.0911 | 92.7 | 80.89 | |
| MME | 0.0475 | 0.0594 | 98 | 71.56 | ||
| Nour | 0.1701 | 0.1717 | - | - | ||
| P4 | 0.4 | MLE | 0.0318 | 0.0495 | 90 | 43.18 |
| MME | 0.0449 | 0.0582 | 25 | 73.64 | ||
| Nour | 0.0822 | 0.0861 | - | - | ||
| 0.8 | MLE | 0.0291 | 0.0644 | 88.89 | 77.53 | |
| MME | 0.0400 | 0.0519 | 97.5 | 68.79 | ||
| Nour | 0.1687 | 0.1703 | - | - | ||
| P5 | 0.4 | MLE | 0.0391 | 0.0701 | 92 | 102.32 |
| MME | 0.0872 | 0.1093 | 79 | 131.05 | ||
| Nour | 0.2065 | 0.2100 | - | - | ||
| 0.8 | MLE | 0.0401 | 0.0724 | 99 | 136.63 | |
| MME | 0.0791 | 0.1040 | 97.5 | 127.81 | ||
| Nour | 0.4174 | 0.4187 | - | - | ||
| P6 | 0.4 | MLE | 0.0377 | 0.0754 | 90 | 101.78 |
| MME | 0.0877 | 0.1131 | 96.5 | 139.50 | ||
| Nour | 0.2135 | 0.2169 | - | - | ||
| 0.8 | MLE | 0.0487 | 0.0681 | 99 | 141.74 | |
| MME | 0.0868 | 0.1099 | 98 | 136.50 | ||
| Nour | 0.4223 | 0.4236 | - | - |
| Population | Method | RB | RRMSE | CP() | LCI | |
|---|---|---|---|---|---|---|
| P1 | 0.4 | MLE | 0.0002 | 0.0153 | 99 | 295.07 |
| MME | -0.0002 | 0.0381 | 99 | 219.52 | ||
| Nour | -0.1661 | 0.1669 | - | - | ||
| 0.8 | MLE | 0.0005 | 0.0182 | 99 | 160.94 | |
| MME | -0.0019 | 0.0382 | 98 | 221.31 | ||
| Nour | -0.3371 | 0.3373 | - | - | ||
| P2 | 0.4 | MLE | 0.0017 | 0.0138 | 99.5 | 412.65 |
| MME | 0.0035 | 0.0388 | 98.5 | 223.56 | ||
| Nour | -0.1626 | 0.1633 | - | - | ||
| 0.8 | MLE | 0.0002 | 0.0161 | 99.5 | 165.51 | |
| MME | 0.0019 | 0.0388 | 99.5 | 227.80 | ||
| Nour | -0.3339 | 0.3342 | - | - | ||
| P3 | 0.4 | MLE | 0.0027 | 0.0144 | 99.5 | 158.39 |
| MME | 0.0021 | 0.0262 | 98 | 154.01 | ||
| Nour | -0.0851 | 0.0860 | - | - | ||
| 0.8 | MLE | 0.0017 | 0.0125 | 100 | 182.97 | |
| MME | 0.0008 | 0.0264 | 98.5 | 159.08 | ||
| Nour | -0.1723 | 0.1726 | - | - | ||
| P4 | 0.4 | MLE | 0.0009 | 0.0089 | 100 | 233.09 |
| MME | 0.0025 | 0.0265 | 97 | 151.14 | ||
| Nour | -0.0825 | 0.0834 | - | - | ||
| 0.8 | MLE | 0.0012 | 0.0089 | 100 | 291.15 | |
| MME | 0.0011 | 0.0261 | 97.5 | 152.76 | ||
| Nour | -0.1705 | 0.1709 | - | - | ||
| P5 | 0.4 | MLE | 0.0008 | 0.0098 | 100 | 226.03 |
| MME | -0.0009 | 0.0466 | 98.5 | 263.76 | ||
| Nour | -0.2064 | 0.2071 | - | - | ||
| 0.8 | MLE | 0.0002 | 0.0180 | 100 | 286.80 | |
| MME | -0.0006 | 0.0448 | 99 | 265.13 | ||
| Nour | -0.4208 | 0.4210 | - | - | ||
| P6 | 0.4 | MLE | -0.0002 | 0.0195 | 99.5 | 309.30 |
| MME | 0.0026 | 0.0505 | 99 | 276.84 | ||
| Nour | -0.2128 | 0.2135 | - | - | ||
| 0.8 | MLE | 0.0002 | 0.0218 | 99.5 | 378.88 | |
| MME | 0.0030 | 0.0508 | 99.5 | 277.64 | ||
| Nour | -0.4248 | 0.4250 | - | - |
| Population | Method | RB | RRMSE | CP() | LCI | |
|---|---|---|---|---|---|---|
| Results on estimators of | ||||||
| P1 | 0.4 | MLE | 0.0028 | 0.0265 | 96.5 | 27.84 |
| Nour | -0.1692 | 0.1728 | - | - | ||
| 0.8 | MLE | 0.0073 | 0.0401 | 96 | 44.36 | |
| Nour | -0.3382 | 0.3398 | - | - | ||
| P2 | 0.4 | MLE | 0.0034 | 0.0300 | 95.5 | 27.22 |
| Nour | -0.1659 | 0.1691 | - | - | ||
| 0.8 | MLE | 0.0067 | 0.0424 | 96 | 42.23 | |
| Nour | -0.3350 | 0.3365 | - | - | ||
| P3 | 0.4 | MLE | 0.0108 | 0.0425 | 92.5 | 40.81 |
| Nour | -0.0883 | 0.0920 | - | - | ||
| 0.8 | MLE | 0.0288 | 0.0742 | 86.5 | 57.93 | |
| Nour | -0.1721 | 0.1741 | - | - | ||
| P4 | 0.4 | MLE | 0.0072 | 0.0379 | 95 | 37.11 |
| Nour | -0.0857 | 0.0893 | - | - | ||
| 0.8 | MLE | 0.0208 | 0.0603 | 92 | 55.67 | |
| Nour | -0.1704 | 0.1724 | - | - | ||
| P5 | 0.4 | MLE | 0.0040 | 0.0291 | 95.5 | 31.02 |
| Nour | -0.2066 | 0.2103 | - | - | ||
| 0.8 | MLE | 0.0126 | 0.0473 | 96.5 | 51.26 | |
| Nour | -0.4188 | 0.4199 | - | - | ||
| P6 | 0.4 | MLE | 0.0053 | 0.0328 | 93.5 | 32.54 |
| Nour | -0.2117 | 0.2157 | - | - | ||
| 0.8 | MLE | 0.0087 | 0.0419 | 97 | 48.92 | |
| Nour | -0.4225 | 0.4236 | - | - | ||
| Results on estimators of | ||||||
| P1 | 0.4 | MLE | 0.0302 | 0.0374 | 94.5 | 30.61 |
| Nour | -0.1603 | 0.1639 | - | - | ||
| 0.8 | MLE | 0.0103 | 0.0505 | 95 | 44.01 | |
| Nour | -0.3305 | 0.3321 | - | - | ||
| P2 | 0.4 | MLE | 0.0076 | 0.0386 | 93 | 29.65 |
| Nour | -0.1656 | 0.1699 | - | - | ||
| 0.8 | MLE | 0.0095 | 0.0491 | 97 | 44.00 | |
| Nour | -0.3367 | 0.3383 | - | - | ||
| P3 | 0.4 | MLE | 0.0166 | 0.0501 | 93 | 38.01 |
| Nour | -0.0854 | 0.0908 | - | - | ||
| 0.8 | MLE | 0.0314 | 0.0761 | 88.5 | 52.43 | |
| Nour | -0.1730 | 0.1746 | - | - | ||
| P4 | 0.4 | MLE | 0.0120 | 0.0407 | 91.5 | 32.28 |
| Nour | -0.0795 | 0.0839 | - | - | ||
| 0.8 | MLE | 0.0229 | 0.0647 | 92.5 | 49.06 | |
| Nour | -0.1678 | 0.1695 | - | - | ||
| P5 | 0.4 | MLE | 0.0042 | 0.0438 | 95.5 | 35.63 |
| Nour | -0.2014 | 0.2059 | - | - | ||
| 0.8 | MLE | 0.0120 | 0.0656 | 93 | 52.44 | |
| Nour | -0.41283 | 0.4144 | - | - | ||
| P6 | 0.4 | MLE | 0.0029 | 0.0456 | 97 | 37.18 |
| Nour | -0.2223 | 0.2262 | - | - | ||
| 0.8 | MLE | 0.0101 | 0.0604 | 95.5 | 51.97 | |
| Nour | -0.42544 | 0.4270 | - | - | ||
| Population | Method | RB | RRMSE | CP() | LCI | |
|---|---|---|---|---|---|---|
| Results on estimators of | ||||||
| P1 | 0.4 | MLE | 0.0004 | 0.0097 | 96 | 46.14 |
| Nour | -0.1648 | 0.1654 | - | - | ||
| 0.8 | MLE | 0.0001 | 0.0127 | 93.5 | 61.19 | |
| Nour | -0.3371 | 0.3374 | - | - | ||
| P2 | 0.4 | MLE | 0.0008 | 0.0111 | 95 | 52.49 |
| Nour | -0.1615 | 0.1621 | - | - | ||
| 0.8 | MLE | 0.0001 | 0.0135 | 93.5 | 64.33 | |
| Nour | -0.3337 | 0.3341 | - | - | ||
| P3 | 0.4 | MLE | 0.0001 | 0.0069 | 93.5 | 33.13 |
| Nour | -0.0849 | 0.0856 | - | - | ||
| 0.8 | MLE | 0.0010 | 0.0088 | 94.5 | 41.02 | |
| Nour | -0.1709 | 0.1714 | - | - | ||
| P4 | 0.4 | MLE | 0.0001 | 0.0063 | 94.5 | 30.10 |
| Nour | -0.0825 | 0.0831 | - | - | ||
| 0.8 | MLE | 0.0009 | 0.0084 | 95 | 38.98 | |
| Nour | -0.1691 | 0.1695 | - | - | ||
| P5 | 0.4 | MLE | -0.0011 | 0.0121 | 95 | 57.61 |
| Nour | -0.2084 | 0.2091 | - | - | ||
| 0.8 | MLE | 0.0001 | 0.0157 | 94.5 | 75.60 | |
| Nour | -0.4210 | 0.4212 | - | - | ||
| P6 | 0.4 | MLE | -0.0012 | 0.0139 | 94.5 | 66.93 |
| Nour | -0.2149 | 0.2157 | - | - | ||
| 0.8 | MLE | 0.00021 | 0.0168 | 95 | 80.66 | |
| Nour | -0.4253 | 0.4255 | - | - | ||
| Results on estimators of | ||||||
| P1 | 0.4 | MLE | -0.0001 | 0.0150 | 95 | 58.85 |
| Nour | -0.1603 | 0.1611 | - | - | ||
| 0.8 | MLE | 0.0003 | 0.0187 | 95.5 | 74.76 | |
| Nour | -0.3304 | 0.3307 | - | - | ||
| P2 | 0.4 | MLE | -0.0005 | 0.0152 | 95 | 60.38 |
| Nour | -0.1664 | 0.1673 | - | - | ||
| 0.8 | MLE | 0.0003 | 0.0192 | 96.5 | 75.81 | |
| Nour | -0.3367 | 0.3370 | - | - | ||
| P3 | 0.4 | MLE | 0.0005 | 0.0097 | 94 | 38.60 |
| Nour | -0.0879 | 0.0891 | - | - | ||
| 0.8 | MLE | -0.0008 | 0.0123 | 95 | 48.36 | |
| Nour | -0.1741 | 0.1745 | - | - | ||
| P4 | 0.4 | MLE | 0.0005 | 0.0092 | 94.5 | 36.81 |
| Nour | -0.0809 | 0.0819 | - | - | ||
| 0.8 | MLE | 0.0009 | 0.0121 | 94.5 | 46.92 | |
| Nour | -0.1689 | 0.1693 | - | - | ||
| P5 | 0.4 | MLE | 0.0023 | 0.0184 | 95 | 71.12 |
| Nour | -0.2012 | 0.2020 | - | - | ||
| 0.8 | MLE | 0.0004 | 0.0231 | 95 | 91.65 | |
| Nour | -0.4165 | 0.4168 | - | - | ||
| P6 | 0.4 | MLE | 0.0023 | 0.0196 | 96.5 | 77.21 |
| Nour | -0.2198 | 0.2207 | - | - | ||
| 0.8 | MLE | 0.0003 | 0.0240 | 96 | 95.88 | |
| Nour | -0.4289 | 0.4293 | - | - | ||
| Population | Method | RB | RRMSE | CP() | LCI | |
|---|---|---|---|---|---|---|
| P1 | 0.4 | MLE | -0.0010 | 0.0111 | 100 | 149.35 |
| Wolter-2 | -0.0013 | 0.0131 | 54 | 23.31 | ||
| 0.8 | MLE | -0.0008 | 0.0121 | 100 | 206.59 | |
| Wolter-2 | -0.0013 | 0.0131 | 21 | 9.16 | ||
| P2 | 0.4 | MLE | 0.0014 | 0.0151 | 100 | 152.80 |
| Wolter-2 | 0.0007 | 0.0166 | 55 | 30.81 | ||
| 0.8 | MLE | 0.0014 | 0.0157 | 100 | 209.97 | |
| Wolter-2 | 0.0007 | 0.0166 | 29.5 | 14.95 | ||
| P3 | 0.4 | MLE | 0.0013 | 0.0103 | 100 | 140.26 |
| Wolter-2 | 0.0005 | 0.0133 | 53.5 | 24.45 | ||
| 0.8 | MLE | 0.0012 | 0.0090 | 100 | 157.11 | |
| Wolter-2 | 0.0005 | 0.0133 | 21 | 10.59 | ||
| P4 | 0.4 | MLE | 0.0013 | 0.0074 | 100 | 140.14 |
| Wolter-2 | 0.0008 | 0.0099 | 55.5 | 17.89 | ||
| 0.8 | MLE | 0.0008 | 0.0064 | 100 | 151.99 | |
| Wolter-2 | 0.0008 | 0.0099 | 22 | 5.88 | ||
| P5 | 0.4 | MLE | -0.0001 | 0.0101 | 100 | 161.52 |
| Wolter-2 | 0.0001 | 0.0184 | 52 | 33.49 | ||
| 0.8 | MLE | 0.0007 | 0.0167 | 100 | 248.26 | |
| Wolter-2 | 0.0001 | 0.0184 | 27 | 17.07 | ||
| P6 | 0.4 | MLE | 0.0013 | 0.0116 | 100 | 361.61 |
| Wolter-2 | -0.0005 | 0.0248 | 54 | 45.24 | ||
| 0.8 | MLE | 0.0006 | 0.0115 | 100 | 260.15 | |
| Wolter-2 | -0.0005 | 0.0248 | 31 | 25.62 |
| Population | Method | RB | RRMSE | CP() | LCI | |
|---|---|---|---|---|---|---|
| Results on estimators of | ||||||
| P1 | 0.4 | MLE | 0.0003 | 0.0061 | 97 | 5.93 |
| Wolter-1 | -0.0680 | 0.3516 | 53.5 | 1379.14 | ||
| 0.8 | MLE | 0.0001 | 0.0003 | 90 | 1.80 | |
| Wolter-1 | -0.2125 | 0.6351 | 25.5 | 465.28 | ||
| P2 | 0.4 | MLE | 0.0003 | 96.5 | 4.51 | |
| Wolter-1 | -0.0216 | 0.5508 | 71.5 | 1290.38 | ||
| 0.8 | MLE | 0.0015 | 95 | 3.78 | ||
| Wolter-1 | -0.0881 | 2.2801 | 34 | 580.83 | ||
| P3 | 0.4 | MLE | 0.0002 | 0.0033 | 98 | 12.05 |
| Wolter-1 | 0.2069 | 3.4894 | 81.5 | 923.23 | ||
| 0.8 | MLE | 0.0004 | 0.0052 | 98 | 15.87 | |
| Wolter-1 | -0.1149 | 0.1825 | 54 | 397.55 | ||
| P4 | 0.4 | MLE | 0.0005 | 0.0068 | 99 | 14.82 |
| Wolter-1 | -0.0103 | 0.3202 | 83 | 927.42 | ||
| 0.8 | MLE | 0.0003 | 97.5 | 17.12 | ||
| Wolter-1 | -0.0774 | 0.2726 | 52.5 | 420.36 | ||
| P5 | 0.4 | MLE | 0.0002 | 0.0032 | 96.5 | 1.61 |
| Wolter-1 | -0.1223 | 0.3675 | 53 | 1643.76 | ||
| 0.8 | MLE | 99.5 | 1.28 | |||
| Wolter-1 | -0.2561 | 0.8123 | 29.5 | 508.70 | ||
| P6 | 0.4 | MLE | 0.0003 | 99 | 0.91 | |
| Wolter-1 | -0.1910 | 0.4858 | 69 | 1431.90 | ||
| 0.8 | MLE | 100 | 3.80 | |||
| Wolter-1 | -0.2666 | 0.7618 | 27.5 | 612.75 | ||
| Results on estimators of | ||||||
| P1 | 0.4 | MLE | 0.0002 | 0.0061 | 81.5 | 4.78 |
| Wolter-1 | -0.0688 | 0.3521 | 54.5 | 1379.15 | ||
| 0.8 | MLE | -0.0001 | 0.0003 | 89.5 | 1.56 | |
| Wolter-1 | -0.2163 | 0.6371 | 28 | 465.28 | ||
| P2 | 0.4 | MLE | 0.0004 | 95.5 | 3.79 | |
| Wolter-1 | -0.0247 | 0.5520 | 67 | 1290.38 | ||
| 0.8 | MLE | 0.0015 | 95 | 3.25 | ||
| Wolter-1 | -0.0918 | 2.2806 | 31 | 580.83 | ||
| P3 | 0.4 | MLE | 0.0002 | 0.0033 | 98 | 10.05 |
| Wolter-1 | 0.2027 | 3.4896 | 82.5 | 923.23 | ||
| 0.8 | MLE | 0.0004 | 0.0052 | 98 | 13.24 | |
| Wolter-1 | -0.1194 | 0.1868 | 52 | 397.55 | ||
| P4 | 0.4 | MLE | 0.0005 | 0.0067 | 99 | 12.33 |
| Wolter-1 | -0.0127 | 0.3211 | 85 | 927.42 | ||
| 0.8 | MLE | 0.0002 | 98.5 | 14.25 | ||
| Wolter-1 | -0.0816 | 0.2753 | 51.5 | 420.36 | ||
| P5 | 0.4 | MLE | 0.0002 | 0.0032 | 97 | 1.38 |
| Wolter-1 | -0.1243 | 0.3689 | 52 | 1643.76 | ||
| 0.8 | MLE | 99.5 | 1.12 | |||
| Wolter-1 | -0.2611 | 0.8149 | 27 | 508.70 | ||
| P6 | 0.4 | MLE | 0.0002 | 99 | 0.78 | |
| Wolter-1 | -0.1982 | 0.4904 | 63.5 | 1431.90 | ||
| 0.8 | MLE | 100 | 3.20 | |||
| Wolter-1 | -0.2722 | 0.7651 | 26 | 612.75 | ||
| Dataset | Stratum | Total | |||
|---|---|---|---|---|---|
| Encephalitis | Adult | 39 | 290 | 39 | 368 |
| Children | 20 | 78 | 15 | 113 | |
| Children Death | Male | 30 | 153 | 8 | 191 |
| Female | 15 | 173 | 7 | 195 |
| Model I | Model II | LP | |||
|---|---|---|---|---|---|
| Dataset | Stratum | MLE | MLE | ||
| Adult | [RSE] | 660 [0.077] | 739 [ 0.012] | 658 [0.212] | |
| C.I. | (563, 760) | (731, 748) | (463, 988) | ||
| Encephalitis | 0.052 | 0.031 | - | ||
| Children | [RSE] | 197 [ 0.104] | 213[0.072] | 171[ 0.317] | |
| C.I. | (160, 241) | (160, 241) | (101, 314) | ||
| Male | [RSE] | 268 [0.054] | 250 [0.092] | 231[0.244] | |
| C.I. | (244, 303) | (204, 302) | (151, 362) | ||
| Children Death | 0.070 | 0.006 | - | ||
| Female | [RSE] | 276 [0.052] | 262 [0.097] | 275 [0.424] | |
| C.I. | (250, 306) | (212, 324) | (145, 552) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On the estimation of population size from a post-stratified two sample capture-recapture data under dependence
Abstract
Population size estimation based on two sample capture-recapture type experiment is an interesting problem in various fields including epidemiology, pubic health, population studies, etc. The Lincoln-Petersen estimate is popularly used under the assumption that capture and recapture status of each individual is independent. However, in many real life scenarios, there is an inherent dependency between capture and recapture attempts which is not well-studied in the literature of the dual system or two sample capture-recapture method. In this article, we propose a novel model that successfully incorporates the possible causal dependency and provide corresponding estimation methodologies for the associated model parameters based on post-stratified two sample capture-recapture data. The superiority of the performance of the proposed model over the existing competitors is established through an extensive simulation study. The method is illustrated through analysis of some real data sets.
Kiranmoy Chatterjee
Interdisciplinary Statistical Research Unit, Indian Statistical Institute
E-mail: [email protected]
Prajamitra Bhuyan
Department of Mathematics, Imperial College London
E-mail: [email protected]
Keywords : Behavioural dependency, Bivariate Bernoulli, Disease surveillance, Method of moments, Maximum likelihood, Post-stratification.
1 Introduction
Estimation of the size of a population is an interesting problem in different disciplines of epidemiological, medical, social and demographic studies. In order to formulate policies for public heath related issues, federal agencies are generally interested to know the actual size of a diseased population (e.g. Encephalitis patients) or vital events (e.g. child mortality) in a specified region. Any attempt to count all the individuals belonging to a population of interest is always subject to error and the degree of error depends on many factors, such as, population size, individual’s capture probability, etc. In this context, two sources of information have extensive use for human population as more than two sources are hardly found in demographic study due to various practical constraints such as survey cost, human mobility, etc. (Chatterjee and Mukherjee, 2016b, ). In order to draw inference from two capture attempts, one needs to combine the data obtained from the two surveys and determine how many people are included in both the lists and how many are included exactly in one of the lists. Therefore, an incomplete cross-classified data structure is obtained and it is known as dual-record system (DRS). This data structure is similar to the two sample capture-recapture data (Wolter,, 1986; Chatterjee and Mukherjee, 2016a, ). In DRS, counts for the three cells are available, however the last cell count remained unknown which makes the true population size, say , unknown. The primary goal is to estimate the missing cell count, or equivalently , from the available data. This is somewhat close to the capture-recapture experiment, widely practiced in wild-life studies, with only one recapture attempt. Often, survey mechanism allows post-stratification of the entire population into mutually exclusive and exhaustive sub-populations based on demographic and social characteristics (e.g. age, sex, ethnicity, etc.), and it is also of great interest to estimate the sub-population sizes (Bell,, 1993; Wolter,, 1990).
In order to estimate , a common practice is to assume causal independence between capture and recapture attempts and the resulting estimator is popularly known as Lincoln-Petersen (LP) estimator in DRS (Otis et al.,, 1978; Bohning and Heijden,, 2009). With an additional assumption of time-variation, Otis et al., (1978) proposed the model and the resulting estimator is same as LP estimator. Chatterjee and Mukherjee, 2016a proposed an integrated likelihood estimation methodology based on the model and compared its performance with others likelihood based estimators in DRS. However, model (equivalently, the LP estimator) often fails due to positive dependence among the two lists, especially in the fields of public health and demography, which leads to underestimation of (Hook and Regal,, 1982; Chao et al.,, 2001). For example, patients with positive result from a serum test for Hepatitis A Virus (HAV) are prone to visit hospital for further treatment. Therefore, the ascertainment of the serum sample and that of the hospital sample becomes dependent. In census-undercount study, Fay et al., (1988) and Bell, (1993) observed such dependence in behavioral response among adult males but not for females in the Post Enumeration Programs conducted for evaluating the US Censuses in 1980 and 1990 respectively. In epidemiological or demographic surveillance with two sample capture-recapture experiment, positive list-dependence is often observed (Chatterjee and Mukherjee, 2016b, ; Schrauder and Hellenbrand,, 2007). Similarly, there are some populations in which negative dependence is encountered, such as children injury data collected by hospitals and police stations, drug abused population, population of patients affected with HIV or any other diseases that bear social stigma (Chatterjee and Mukherjee, 2016b, ; Chatterjee and Mukherjee,, 2018). Recently, Yang and Pal, (2010) have proposed an empirical Bayes estimator which performs better than LP estimator as well as some of its modified versions including Chapman’s and Bailey’s estimators. However, their underlying hypergeometric model does not encounter the list-dependence. In this context, model , proposed by Otis et al., (1978), exclusively includes the list-dependence in terms of behavioral response effect parameter, but this model is not estimable in DRS (Chao et al.,, 2000; Chatterjee and Mukherjee, 2016b, ). Yang and Chao, (2005) proposed a Markov chain approach that incorporates both long-term as well as short-term behavioral response effects in the existing models for capture-recapture experiments. However, their model is also not estimable in DRS (Chatterjee,, 2015).
Modeling of the capture-recapture data incorporating the causal dependence assumption is an important but challenging task in DRS. Nour, (1982) proposed an estimate of total number of vital records assuming the positive dependence between two lists in DRS of vital events registration. Wolter, (1990) provided estimation for post-strata wise sub-population (e.g. male, female) sizes under two different models, assuming the ratio of the sub-population sizes (i.e. the sex-ratio) to be known from Demographic Analysis. In the first model, Wolter, (1990) considered that the cross-product ratios in DRSs for male and female post-strata are same but unknown and, in the second one, causal independence is assumed for the female only. Isaki and Schultz, (1986) also worked on the same problem for 1980 Post Enumeration Program and suggested an estimate based on demographic analysis. Later, Bell, (1993) proposed some variations of the methods suggested by Wolter, (1990) for the estimation of the cross-product ratios for both male and female populations. However, the ratio of the sub-population sizes (e.g. sex-ratio) is calculated at the time of census for larger population (e.g. national level population). In many situations, it is not realistic to assume that the ratio remains constant over time or holds true for the sub-populations under consideration. Moreover, the availability of this ratio for the population of interest is very much limited across the various fields where the DRS type data structure is commonly used (e.g., epidemiological or disease surveillance data; See Section 6).
In this article, we propose a novel model to incorporate this inherent dependency between capture and recapture attempts in DRS without the knowledge on the ratio of the sub-population sizes and provide estimation methodologies for the population size based on post-stratification under two different scenarios. Our model can also incorporate available information on the ratio of the sub-population sizes and provides better result than the existing competitor. Our work is motivated from two real datasets on public health: (i) Encephalitis incidence in England, 2006-2007 and (ii) child mortality in western Kenya, 2000-2001, where the existing methods proposed by Wolter, (1990) and Nour, (1982) are not applicable. Our model possesses nice interpretation, and associated estimates exhibit superiority with respect to relative bias, relative root mean squared error and coverage probability over the existing competitors available in the literature (See Section 4, 5). We first describe the DRS and the associated data structure in Section 2. In Section 3, we propose a Bivariate Bernoulli model under DRS. Next, in Section 4, we derive method of moments estimates and discuss maximum likelihood estimation of the model parameters. Comparison of the proposed estimators with its existing competitors is studied through extensive simulation and two illustrative data analyses in Sections 5 and 6, respectively. Finally, we end with some concluding remarks in Section 7.
2 Dual-record System (DRS)
As discussed in Section 1, DRS is similar to the two sample capture-recapture sampling which is very common in estimation of the size of human population. Let us consider a population of size N. The individuals captured in the first list (e.g. census) are matched one-by-one with the individuals captured in the second list (e.g. Post Enumeration Program). Let and denote the capture probabilities of the jth individual in the first sample (List 1) and the second sample (List 2), respectively. Under this set-up, we consider the following assumptions:
() population is closed until the second sample is taken,
() individuals are homogeneous with respect to their capture probabilities in each of the two attempts.
Assumption () ensures that in List 1 and in List 2 for j = 1, 2, , N. The data structure, presented in Table 1, is popularly known as the Dual-record system or shortly, DRS. The number of untapped individuals in both the surveys, denoted as , is unknown which makes the total population size N unknown. The probabilities attached to all the cells are also provided in Table 1 and these notation will be followed throughout this paper. As discussed before, casual independence is assumed between capture and recapture attempts, which is formally written as
inclusion of each and every individual, belonging to , in the List 2 is causally independent to its inclusion in the List 1 (i.e. ).
Now assuming , estimate of is found as
[TABLE]
which is popularly known as the Lincoln-Petersen (LP) estimator. This estimator is identical with the conditional likelihood estimator of from the model (Wolter,, 1986) and it is traditionally used in several studies including public health, economics, demography (Bohning and Heijden,, 2009). However, this model is seriously criticized due to its underlying causal independence assumption in the context of human populations (ChandraSekar and Deming,, 1949; Chao et al.,, 2001). In many situations, failure in capturing one individual in both the attempts may be due to some common causes, and that leads to a positive association between the two lists. In some other cases, individuals may be less keen to be enlisted in List 2 which results in a negative association between the lists. These phenomena are broadly known as behavioral response variation (See Wolter, (1986) for more details).
In the context of demographic studies, Nour, (1982) considered possible positive association between the two lists in DRS. Assuming both the marginal list capture probabilities (i.e. and ) are greater than 0.5, Nour, (1982) derived the estimate of as
[TABLE]
where
As mentioned in the previous section, Wolter, (1990) considered post-stratification of the entire population into two mutually exclusive and exhaustive sub-populations, say and , (e.g., male and female) of sizes and such that . Therefore, the observed data, as presented in Table 2, are divided into () and () for the two sub-populations and , respectively. Based on the above datasets, Wolter, (1990) proposed two models where one common assumption is that the ratio of the sub-population sizes , is known. In the first model, say Wolter-1, the cross-product ratios for (say, ) and (say, ), are assumed to be same but unknown, i.e., . The estimates of the sub-population sizes from Wolter-1 are given by
[TABLE]
where , and is the total numbers of captured individuals from the sub-populations for . In the second model, say Wolter-2, Wolter, (1990) additionally assumed that the causal independence holds only for the sub-population , and the resulting estimates are given by
[TABLE]
See Wolter, (1990) for more details.
3 Proposed Model
In this section, we first introduce a Bivariate Bernoulli model (BBM), which is useful in measuring the degree of association between capture and recapture attempts. Although the problem can be generalized to a multivariate setup for multiple lists problem, in the present paper we focus our attention to the bivariate version only for DRS.
In any given population, some individuals are expected to behave independently over the two capture attempts in DRS and dependence in the behavioral responses may exist for rest of the population. Let be such proportion of individuals for whom behavioral dependence between the List 1 and List 2 exists. To capture this dependency structure, we define a pair (), which represents the latent capture statuses of the h-th individual in the first and second attempts, respectively, for . The latent capture status takes value 1 or 0, denoting the presence or absence of the h-th individual in the -th list, for . Under this setup, for proportion of individuals, the value of is same as that of (i.e. ). Now, let us define and , respectively, as the List 1 and List 2 inclusion status of the h-th individual belonging to , for . Note that is manifestation of the latent capture statuses () for the h-th individual. Therefore, we can formally write the interdependence among the two lists as
[TABLE]
where s and s are independently and identically distributed Bernoulli random variables with parameters and , respectively. Note that refers to the capture probability of a causally independent individual in the l-th list. We call this model, given in equation (5), as Bivariate Bernoulli model in DRS (BBM-DRS). Now, we denote by , for . Thus, based on the parameters involved in the above model, presented in equation (5), the cell probabilities associated with DRS (See Table 1) are given by:
[TABLE]
The corresponding marginal probabilities are given by
[TABLE]
with .. Note that the proposed Bivariate Bernoulli model incorporates positive dependence between capture status in Lists 1 and 2. In particular, when (i.e. there is no case of causal dependency), our proposed Bivariate Bernoulli model in (5) reduces to the model.
Remark** 1****.**
One can define the proposed BBM-DRS in order to capture negative dependency (or, recapture aversion) by rewriting (5) as
[TABLE]
Remark** 2****.**
The parameters of BBM-DRS possess easy interpretations with practical significance. The dependence parameter represents proportion of behaviorally dependent individuals, and is the capture probability of an causally independent individual in the l-th List, for .
4 Estimation Methodologies
In practice, one can easily consider post-stratification of the entire population into two mutually exclusive and exhaustive sub-populations and as discussed in Section 2 (See Wolter, (1990), Eisele et al., (2003) and Granerod et al., (2013)). We also assume that for any individual, belonging to , the capture status in either of the two lists is independent of the same of an individual belonging to . In order to denote the cell counts and the associated probabilities for the table obtained under the DRS for the sub-population , we consider the same notation as mentioned in Table 1, with an additional suffix (for example, List 1 capture probability for the sub-population is denoted as ), for . Now we consider two different models and propose methodologies for estimation of the associated parameters including the population size N$$(=N_{A}+N_{B}), the parameter of primary interest.
4.1 Model I
In this model, we consider the assumption for the sub-population , which implies . Therefore, the popular Lincoln-Petersen estimate of is given as . In order to incorporate the behavioural dependency present in the sub-population , we consider BBM-DRS as described in Subsection 3, which consists of four parameters with , and . In addition to , we consider the following assumption:
Initial (List 1) capture probabilities for the individuals belonging to both the sub-populations and are the same (i.e. ).
The assumption ensures estimability of the model parameters. Note that List 1 is prepared before List 2 and hence, List 2 capture probabilities for different sub-populations may differ due behavioral dependence, if exists. Also, it is quite reasonable to consider the same List 1 capture probability for different sub-populations when possibly there is no prejudice. Similar assumption has been considered by several authors in the past (Bell,, 1993). Under similar setup, Wolter, (1990) proposed estimate of based on model and the estimate of using the available knowledge on the ratio of the sub-population sizes (e.g. sex-ratio). As discussed before, the availability of reliable estimate of this ratio remains a practical challenge (See Section 6). As mentioned before, is estimated assuming causal independence, and hence, one needs to find the estimate of in order to estimate the population size . Since can be interpreted as the proportion of behaviorally dependent individuals, its estimation may provide interesting insight of the capture-recapture mechanism.
First we consider method of moments estimation of the parameters associated with the proposed Model I. Note that the method of moments estimate (MME) of is same as the Lincoln-Petersen estimate , and the MMEs of and are given as and , respectively. Using the assumption , the estimate of is given by . Now, equating the expected and observed number of cell counts in the table obtained under the DRS (Table 1) for the sub-population , we get
[TABLE]
which involve three unknown parameters , and . Solving these equations in (6), the MMEs of the model parameters are obtained as
[TABLE]
The detailed derivation for finding the above mentioned MMEs are provided in the Appendix.
A classical approach for estimating from an incomplete cross-classified data structure, is based on likelihood theory, where the data (i.e. all observed cell counts in Table 1) follow a multinomial distribution with index parameter and the associated cell probabilities (Sanathanan,, 1972). Therefore, using the relations between the cell probabilities and , as provided in Section 3, the likelihood function of is given by
[TABLE]
where , , for . However, explicit solution for maximum likelihood estimate (MLE) of is not possible. The Newton-Raphson method can be used to maximize the log-likelihood in order to estimate , assuming and as continuous parameters. Alternatively, any standard software package equipped with general purpose optimization (e.g., optim in the package R) can be used. Note that the log-likelihood function involves , which may create computational difficulty for large values of . In order to avoid such issues we approximate as (Wells,, 1986, p. 45).
Remark** 3****.**
The above likelihood function (7) can be simplified using Stirling’s approximation of (Whittaker and Robinson,, 1967, p. 138-140), and obtain closed form expression of the MLEs. Interestingly, the MLEs for all the parameters are exactly equal to the respective MMEs.
Remark** 4****.**
If the ratio of the sub-population sizes (e.g. sex-ratio for male-female stratification) is known, one can easily incorporate such information in the likelihood function (7) taking .
4.2 Model II
In Model II, we relax the assumption and the BBM-DRS is considered for both the sub-populations and with parameters , , , and , for . Similar to Model I, we consider the assumption (i.e. , say) and additionally we assume , say, which ensures estimability of Model II. Under similar setup, Wolter, (1990) proposed estimates of and using the ratio of the sub-population sizes. As discussed before, reliable estimate of this ratio is not available in most of the cases.
We first consider the method of moments for estimating the parameters associated with the Model II. We equate the expected and observed cell counts from the tables obtained under the DRS involving six parameters and find the following MMEs as
[TABLE]
The derivation for finding the above mentioned MMEs is similar to that of Model I. See Appendix for more details. In some cases , and hence, the estimates for , and become negative, as in Wolter, (1990). Such issues with MME has been discussed in the literature (See Bowman and Shenton, (1998, p. 2092-2098) for more details). Therefore, it is not advisable to use MME for the proposed Model II and one should prefer the maximum likelihood estimates as provided below.
Using the relations between the cell probabilities and , as provided in Section 3, the likelihood function of is given by
[TABLE]
where , , for . Since, the explicit solution for MLE of cannot be obtained, same computational strategy is followed here as in the case of Model I. As remarked in Subsection 4.1, here also one can consider the same reparameterization in the likelihood function (8), if the ratio of the sub-population sizes is known.
5 Simulation Study
In this section, the performance of the proposed estimators are thoroughly investigated based on simulation study and compared with the existing competitors. For this purpose, we consider six trial populations, denoted by , with the choices of capture probabilities , respectively, for , with and . We present the simulation study in two fold. Firstly, we consider the ratio of the sub-population sizes () is unknown and compare the performance of our proposed estimators with the Nour’s (Nour,, 1982) estimator given by (2). As discussed before, the estimators, (3) and (4), proposed by Wolter, (1990) are not applicable here. Secondly, we consider is known and the Wolter’s Wolter, (1990) estimators are compared with the proposed estimators. It is important to note that the Nour’s (Nour,, 1982) method is unable to incorporate the knowledge on .
First, we generate 1000 data sets from Model I for each of the six said trial populations with . As the LP estimator of produces efficient results under the causal independence assumption (S3) for large or moderately large samples, our primary interest in Model I lies in the estimate of based on MME and MLE. Final estimate of is obtained by averaging the estimates over replications. To compare the performance of the estimators, we compute relative bias (RB) and relative root mean square error (RRMSE) using the following formula:
[TABLE]
In the capture-recapture setting, point estimators of population size are commonly possess positively skewed distributions (Yang and Pal,, 2010). Therefore, we obtain confidence interval (C.I.) for based on the log-transformation method, discussed in Chao et al., (1987) and Yang and Chao, (2005). In this method, is approximately treated as normal variate and that gives confidence interval as
[TABLE]
where , and is the estimate of the variance of . For each of the 1000 replications, is computed using parametric bootstrap method based on 1000 bootstrap samples. Length of the confidence interval (LCI) as well as its coverage probability (CP) are computed following the methods discussed in Yang and Pal, (2010) and Chatterjee and Mukherjee, (2018). First, we need to compare the CPs of each of the estimators to see which one performs the best. Further, we need to compare the LCIs when coverage probabilities (CPs) are found to be similar (Yang and Pal,, 2010). Note that Nour’s (Nour,, 1982) estimator is not model-based and corresponding CP and LCI cannot be obtained by the aforementioned parametric bootstrap method. The results are presented in Tables 2 and 3 for true population size and , respectively.
From Table 2, it is observed that both the proposed estimators (MME and MLE) of outperform the Nour’s (Nour,, 1982) estimator in terms of RB and RRMSE. One can also observe that the RB and RRMSE of the MLE are smaller compared to that of the MME. Interestingly, the performance of the MLE and MME are comparable with respect to CP and LCI. As expected, both RB and RRMSE of the proposed estimators decrease as the population size increases.
Next, we generate data from Model II considering the same trial populations along with common dependence parameter . Similar to the case of Model I, we obtain RB, RRMSSE, CP, and LCI for estimators of both and , and the results are presented in Tables 4 and 5. As discussed before, the proposed MME from the Model II is often found to be negative; hence, these estimator has not been considered for this simulation study. It is clear from the results presented in Table 4 that the performance of the proposed MLE under Model II is significantly better than that of Nour, (1982) both in terms of RB and RRMSE. Nour’s (Nour,, 1982) estimator underestimates the and , where as the biases incurred by our proposed MLE are negligible for both the sub-population sizes. The results from Table 4 indicate that the interval estimates based on the proposed MLE performs efficiently both in terms of CP as well as LCI. As expected, the RB and RRMSE of the MLE decreases as the population sizes and increases.
As mentioned in Remark 4, information on the ratio of the sub-population size , if available, can be incorporated in our proposed likelihood based estimate. It is important to note that the estimate of the ratio of sub-population sizes may be available for large population based on previous studies (Wolter,, 1990). For example, in a census coverage study, estimate of the sex-ratio may be available from a past demographic analysis of the population under consideration(Robinson et al.,, 1993). Therefore, assuming to be known, we presented this analysis only for the large populations, that is for . The performance of our proposed estimator under Model I (Model II) is compared with the estimator of Wolter-2 (Wolter-1) and the results are presented in Table 6 (Table 7). It is clearly seen that the proposed estimator is superior than Wolter’s estimators with respect to RB and RRMSE. Moreover, our models produce far better CPs of its 95% CIs than that of the models proposed by Wolter, (1990) for both the choices of or . The resulting CIs from Wolter-2 has shorter lengths than that of our Model I, however, Wolter-1 exhibits much more wider confidence intervals compared to the proposed Model II. Similar results are also observed (not reported here) for .
6 Applications
In this section, we first analyze a data set on Encephalitis (infectious and noninfectious) incidence in England during November 2006 to October 2007 (Granerod et al.,, 2013), presented in the top panel of Table 8. This particular data was collected adhering to an encephalitis code in any of the 20 diagnostic fields, and segregated into two strata, Children ( years) and Adult ( years). A patient detected with encephalitis by a hospital clinician was likely to be recorded in Hospital Episode Statistics (HES) and also included in the the Public Health England (PHE) study. Thus, Granerod et al., (2013, p. 1461) anticipated that the two sources are likely to be positively dependent. As a result, the LP estimator, given in (1), probably underestimate the true number of cases. Note that the estimator proposed by Nour, (1982) cannot be applied for both the strata as its underlying condition () is not valid. Also, the estimators proposed by Wolter, (1990) can not be applied as the ratio of adult and child patients (equivalent to sex-ratio for male-female stratification) is not available here. Therefore, we compare the results from our proposed models with that of the LP estimator defined in (1).
As remarked in Subsection 4.1, the MMEs are approximately equal to the MLEs under Model I, and hence we only consider MLE for our data analysis. For analyzing the data under Model I, we further consider both the cases separately where capture recapture status for Children and Adult are independent. In order to compute the estimate of standard error , we use the same parametric bootstrap method as mentioned in Section 5. Comparing the relative standard error (RSE), i.e. , we find that our proposed estimator under Model I performs better with independent assumption for Children than that for Adult and the corresponding results are reported in the top panel of Table 9. Estimate of the dependence parameter indicates that of the Adult encephalitis patients are causally dependent. Under Model II, the estimated number of patients is larger compared to that of under Model I. The estimated proportion of causally dependent patients for both Adult and Children are under Model II. It is interesting to note that the relative standard error (RSE), based on 1000 bootstrap samples, of the MLE under Model II (Model I) is substantially smaller compared to those of the MLE under Model I ( Model).
Now we consider another dual system dataset (See bottom panel of Table 8) from Wagai and Yala divisions in western Kenya on child mortality, named as Gem in the article by Eisele et al., (2003). This study is on the completeness and differential ascertainment of vital events related to child health among male and female children (less than five years old) registered in demographic surveillance system (DSS) based on two-sample capture-recapture experiment. Here also, both the methods, proposed by Wolter, (1990) and Nour, (1982), are not applicable because of same reasons mentioned earlier. Analyzing the data, we find that performance of our proposed estimator s under Model I, for , performs better with the assumption that capture recapture status for Female are independent. The results are presented in the bottom panel of Table 9. It is seen that the estimates for female deaths based on Model I and model are very close, however the r.s.e is for Model I is smaller compared to that of model. Estimate of the dependence parameter indicates that of male child are causally dependent under Model I. Under Model II, the MLEs are marginally lower compared to those of under Model I. In this case the RSE of the estimates under Model I(Model II) is smaller compared to those under Model II ( model). Based on our analysis no evidence of list-dependence was found in the DRS under consideration which supports the argument made by Eisele et al., (2003).
7 Concluding Remarks
This article deals with a very interesting problem when causal independence assumption in DRS is not valid. We introduce a model, called Bivariate Bernoulli model, that successfully accounts for the possible dependence between capture and recapture attempts. Though the proposed model discusses positive correlation, one can rewrite the model easily in order to incorporate negative dependence (See Ramark 1). Our proposed model seems to have an edge in terms of ease of interpretation and has much wider domain of applicability. In case, the ratio of the subpopulation sizes (e.g., sex-ratio for male-female stratification) is known, estimates based on our proposed models may be preferred. This also allows inclusion of any additional information (e.g. sex-ratio), if available, to make more efficient inference. Although the primary objective of this article is to obtain an efficient estimate of the population size , the estimates of the other model parameters, especially , give specific insights into the capture-recapture mechanism. The BBM can also be extended for multiple list or multiple capture-recapture problems which is commonly encountered in the study of wildlife population. It is also an interesting problem to develop a testing procedure to test the behavioral dependence between two sources in DRS, which will be taken up in future work.
Appendix
**Derivation for MME under Model I:
**We get from (6), the following equation in terms of , , and :
[TABLE]
where . Now, by adding (9) and (10), we get the MME of as
[TABLE]
Again, by adding the equations (9)-(11),
[TABLE]
and by subtracting (11) from (10), we get
[TABLE]
Now, using the estimates and in (12), we get
[TABLE]
Since , (9) implies
[TABLE]
Subtracting (13) from (14), the MME of is obtained as
[TABLE]
Using in (13), MME of is given as
[TABLE]
In order to ensure that MME of lies in , we modify (15) and consider
[TABLE]
Derivation for MME under Model II: We get from (6), the following equation in terms of , , , and :
[TABLE]
Now, dividing (16) by (17) we get
[TABLE]
Next, we equate the expected and observed number of cell counts from the 22 table obtained under DRS for the sub-population and get
[TABLE]
Now, we consider the assumption (S4) and . Therefore, dividing (16) by (19) we get
[TABLE]
since .
Similarly, dividing (17) by (20) we get
[TABLE]
From equations (21) and (22), we get
[TABLE]
and
[TABLE]
Therefore, by putting the above estimate in equations (16) and (18), we get
[TABLE]
and
[TABLE]
respectively. Finally, we obtain the estimates of sub-populations sizes as
[TABLE]
since and .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bell, (1993) Bell, W. R. (1993). Using information from demographic analysis in post-enumeration survey (PES) estimation. Journal of the American Statistical Association , 88:1106–1118.
- 2Bohning and Heijden, (2009) Bohning, D. and Heijden, P. V. D. (2009). Recent developments in life and social science applications of capture–recapture methods. Advanced Statistical Analysis , 93:1–3.
- 3Bowman and Shenton, (1998) Bowman, K. O. and Shenton, L. R. (1998). Encyclopedia of Statistical Sciences . John Wiley & Sons.
- 4Chandra Sekar and Deming, (1949) Chandra Sekar, C. and Deming, W. E. (1949). On a method of estimating birth and death rates and the extent of registration. Journal of the American Statistical Association , 44:101–115.
- 5Chao et al., (2000) Chao, A., Chu, W., and Chiu, H. H. (2000). Capture-recapture when time and behavioral response affect capture probabilities. Biometrics , 56:427–433.
- 6Chao et al., (1987) Chao, A., Tsay, P. K., Lin, S-H. Shau, W.-Y., and Chao, D.-Y. (1987). Estimating the population size for capture-recapture data with unequal catchability. Biometrics , 43:783–791.
- 7Chao et al., (2001) Chao, A., Tsay, P. K., Lin, S-H. Shau, W.-Y., and Chao, D.-Y. (2001). Tutorial in biostatistics: The applications of capture-recapture models to epidemiological data. Statistics in Medicine , 20:3123–3157.
- 8Chatterjee, (2015) Chatterjee, K. (2015). Comment on Yang and Chao (2005), on the identifiability of model MM 1(tb) for two sample capture-recapture experiments. DOI: 10.13140/RG.2.1.4580.1685 .
