Robust estimation of parameters in logistic regression via solving the Cramer-von Mises type L2 optimization problem

Jiwoong Kim

arXiv:1703.07044·math.ST·December 17, 2025

Robust estimation of parameters in logistic regression via solving the Cramer-von Mises type L2 optimization problem

Jiwoong Kim

PDF

Open Access

TL;DR

This paper introduces a new method for estimating logistic regression parameters by solving a Cramer-von Mises type L2 optimization problem, with a thorough analysis of the estimators' asymptotic properties.

Contribution

The paper presents a novel estimation approach for logistic regression parameters using a Cramer-von Mises type L2 optimization, advancing statistical estimation techniques.

Findings

01

Establishes the asymptotic properties of the proposed estimators

02

Demonstrates the effectiveness of the method through theoretical analysis

03

Provides rigorous proofs of estimator consistency and asymptotic normality

Abstract

This paper proposes a novel method to estimate parameters in a logistic regression model. After obtaining the estimators, their asymptotic properties are rigorously investigated.

Tables5

Table 1. (a) normal individual effect

		OLS			Within			RE			MD
		bias	SE	MSE	bias	SE	MSE	bias	SE	MSE	bias	SE	MSE
N	$β_{1}$	0.0006	0.0534	0.0028	0.0019	0.0388	0.0015	0.0018	0.0388	0.0015	0.0018	0.0398	0.0016
	$β_{2}$	-0.0017	0.0550	0.0030	0.0002	0.0389	0.0015	0.0002	0.0395	0.0016	0.0005	0.0400	0.0016
	$β_{3}$	0.0031	0.0504	0.0026	0.0012	0.0383	0.0015	0.0016	0.0385	0.0015	0.0014	0.0395	0.0016
La	$β_{1}$	-0.0034	0.0657	0.0043	7e-04	0.0361	0.0013	4e-04	0.0361	0.0013	8e-04	0.0371	0.0014
	$β_{2}$	0.0043	0.072	0.0052	0.0014	0.0388	0.0015	0.0014	0.0387	0.0015	0.0013	0.0398	0.0016
	$β_{3}$	-0.004	0.0662	0.0044	-4e-04	0.0373	0.0014	-7e-04	0.0373	0.0014	-9e-04	0.0384	0.0015
Lo	$β_{1}$	0.001	0.0887	0.0079	3e-04	0.0379	0.0014	6e-04	0.0381	0.0015	4e-04	0.039	0.0015
	$β_{2}$	0	0.0873	0.0076	0.0013	0.0393	0.0015	0.0013	0.0395	0.0016	0.0011	0.0402	0.0016
	$β_{3}$	0.0015	0.0874	0.0076	0	0.0386	0.0015	1e-04	0.0387	0.0015	-3e-04	0.0402	0.0016
M	$β_{1}$	0.0015	0.0756	0.0057	-9e-04	0.0383	0.0015	-6e-04	0.0384	0.0015	-9e-04	0.0395	0.0016
	$β_{2}$	0.0021	0.077	0.0059	-3e-04	0.0376	0.0014	0	0.0374	0.0014	-4e-04	0.0384	0.0015
	$β_{3}$	8e-04	0.0758	0.0058	0.0018	0.0381	0.0015	0.0013	0.0383	0.0015	0.0018	0.0391	0.0015

Table 2. (a) normal individual effect

		OLS			Within			RE			MD
		bias	SE	MSE	bias	SE	MSE	bias	SE	MSE	bias	SE	MSE
N	$β_{1}$	0.0006	0.0534	0.0028	0.0019	0.0388	0.0015	0.0018	0.0388	0.0015	0.0018	0.0398	0.0016
	$β_{2}$	-0.0017	0.0550	0.0030	0.0002	0.0389	0.0015	0.0002	0.0395	0.0016	0.0005	0.0400	0.0016
	$β_{3}$	0.0031	0.0504	0.0026	0.0012	0.0383	0.0015	0.0016	0.0385	0.0015	0.0014	0.0395	0.0016
La	$β_{1}$	-0.0034	0.0657	0.0043	7e-04	0.0361	0.0013	4e-04	0.0361	0.0013	8e-04	0.0371	0.0014
	$β_{2}$	0.0043	0.072	0.0052	0.0014	0.0388	0.0015	0.0014	0.0387	0.0015	0.0013	0.0398	0.0016
	$β_{3}$	-0.004	0.0662	0.0044	-4e-04	0.0373	0.0014	-7e-04	0.0373	0.0014	-9e-04	0.0384	0.0015
Lo	$β_{1}$	0.001	0.0887	0.0079	3e-04	0.0379	0.0014	6e-04	0.0381	0.0015	4e-04	0.039	0.0015
	$β_{2}$	0	0.0873	0.0076	0.0013	0.0393	0.0015	0.0013	0.0395	0.0016	0.0011	0.0402	0.0016
	$β_{3}$	0.0015	0.0874	0.0076	0	0.0386	0.0015	1e-04	0.0387	0.0015	-3e-04	0.0402	0.0016
M	$β_{1}$	0.0015	0.0756	0.0057	-9e-04	0.0383	0.0015	-6e-04	0.0384	0.0015	-9e-04	0.0395	0.0016
	$β_{2}$	0.0021	0.077	0.0059	-3e-04	0.0376	0.0014	0	0.0374	0.0014	-4e-04	0.0384	0.0015
	$β_{3}$	8e-04	0.0758	0.0058	0.0018	0.0381	0.0015	0.0013	0.0383	0.0015	0.0018	0.0391	0.0015

Table 3. (b) logistic individual effect

		OLS			Within			RE			MD
		bias	SE	MSE	bias	SE	MSE	bias	SE	MSE	bias	SE	MSE
N	$β_{1}$	6e-04	0.0671	0.0045	0.0017	0.0673	0.0045	0.0012	0.0674	0.0045	0.0016	0.0669	0.0045
	$β_{2}$	0.0025	0.0663	0.0044	4e-04	0.0708	0.005	0.0015	0.0689	0.0048	7e-04	0.0699	0.0049
	$β_{3}$	-0.0012	0.0661	0.0044	-0.0026	0.0664	0.0044	-0.002	0.068	0.0046	-0.0028	0.0655	0.0043
La	$β_{1}$	-0.0138	0.1405	0.0199	-0.0148	0.1386	0.0194	-0.0149	0.1376	0.0192	-0.0153	0.1383	0.0194
	$β_{2}$	0.0487	0.3609	0.1326	0.0461	0.359	0.131	0.047	0.3588	0.131	0.0468	0.3588	0.1309
	$β_{3}$	-0.0339	0.2484	0.0629	-0.0297	0.2454	0.0611	-0.0308	0.2457	0.0613	-0.0308	0.2452	0.0611
Lo	$β_{1}$	0.0055	0.0906	0.0082	0.002	0.066	0.0044	0.0023	0.0669	0.0045	0.0013	0.0659	0.0043
	$β_{2}$	-9e-04	0.0927	0.0086	0.0017	0.0668	0.0045	0.0013	0.0672	0.0045	0.0013	0.0665	0.0044
	$β_{3}$	-0.0061	0.0917	0.0085	-9e-04	0.066	0.0044	-0.0012	0.0665	0.0044	-4e-04	0.0666	0.0044
M	$β_{1}$	-0.0032	0.089	0.0079	-5e-04	0.0701	0.0049	-0.0011	0.0707	0.005	-8e-04	0.0689	0.0047
	$β_{2}$	9e-04	0.0964	0.0093	0.0039	0.0681	0.0047	0.0042	0.0691	0.0048	0.0039	0.068	0.0046
	$β_{3}$	-0.0012	0.0909	0.0083	-5e-04	0.0709	0.005	-5e-04	0.0709	0.005	0	0.0712	0.0051

Table 4. (c) Laplace individual effect

		OLS			Within			RE			MD
		bias	SE	MSE	bias	SE	MSE	bias	SE	MSE	bias	SE	MSE
N	$β_{1}$	0.0017	0.0592	0.0035	-5e-04	0.0541	0.0029	-6e-04	0.0542	0.0029	4e-04	0.0506	0.0026
	$β_{2}$	5e-04	0.059	0.0035	-1e-04	0.051	0.0026	-1e-04	0.0504	0.0025	-4e-04	0.0488	0.0024
	$β_{3}$	0	0.0585	0.0034	-8e-04	0.0518	0.0027	-6e-04	0.0526	0.0028	-0.0011	0.0488	0.0024
La	$β_{1}$	-0.0109	0.1236	0.0154	-0.0106	0.1137	0.0131	-0.0099	0.1138	0.0131	-0.0103	0.1119	0.0126
	$β_{2}$	0.0324	0.3074	0.0955	0.0302	0.3033	0.0929	0.0305	0.3032	0.0928	0.0311	0.3027	0.0926
	$β_{3}$	-0.0174	0.2126	0.0455	-0.02	0.2056	0.0427	-0.0189	0.2062	0.0429	-0.0196	0.2049	0.0424
Lo	$β_{1}$	-0.0138	0.1405	0.0199	-0.0148	0.1386	0.0194	-0.0149	0.1376	0.0192	-0.0153	0.1383	0.0194
	$β_{2}$	0.0487	0.3609	0.1326	0.0461	0.359	0.131	0.047	0.3588	0.131	0.0468	0.3588	0.1309
	$β_{3}$	-0.0339	0.2484	0.0629	-0.0297	0.2454	0.0611	-0.0308	0.2457	0.0613	-0.0308	0.2452	0.0611
M	$β_{1}$	-0.005	0.0808	0.0066	-0.0025	0.0528	0.0028	-0.0029	0.0519	0.0027	-0.0015	0.0487	0.0024
	$β_{2}$	0.0039	0.0825	0.0068	0.0019	0.0561	0.0032	0.0021	0.0561	0.0031	0.0019	0.0522	0.0027
	$β_{3}$	0	0.0851	0.0073	-0.002	0.0541	0.0029	-0.0014	0.0544	0.003	-0.0019	0.0501	0.0025

Table 5. (d) MTN individual effect

		OLS			Within			RE			MD
		bias	SE	MSE	bias	SE	MSE	bias	SE	MSE	bias	SE	MSE
N	$β_{1}$	0.0038	0.0749	0.0056	0.0011	0.0823	0.0068	0.0021	0.0832	0.0069	0.0013	0.0587	0.0034
	$β_{2}$	-9e-04	0.077	0.0059	2e-04	0.0771	0.0059	1e-04	0.0779	0.0061	-8e-04	0.0545	0.003
	$β_{3}$	-9e-04	0.0736	0.0054	-0.0035	0.0818	0.0067	-0.0031	0.08	0.0064	-3e-04	0.0572	0.0033
La	$β_{1}$	-0.0069	0.0972	0.0095	-0.0036	0.0909	0.0083	-0.0041	0.0895	0.008	-0.0027	0.0713	0.0051
	$β_{2}$	0.0067	0.1601	0.0257	0.0057	0.1561	0.0244	0.0067	0.1563	0.0245	0.0046	0.1456	0.0212
	$β_{3}$	-0.0031	0.1234	0.0152	-0.0031	0.1179	0.0139	-0.0028	0.1173	0.0138	-0.0056	0.1049	0.011
Lo	$β_{1}$	-0.0141	0.1369	0.0189	-0.0105	0.1256	0.0159	-0.0096	0.1253	0.0158	-0.0098	0.1118	0.0126
	$β_{2}$	0.0292	0.3004	0.0911	0.0303	0.2951	0.088	0.0298	0.2952	0.088	0.0292	0.2895	0.0847
	$β_{3}$	-0.021	0.2136	0.0461	-0.0149	0.206	0.0427	-0.0149	0.2063	0.0428	-0.0165	0.1974	0.0392
M	$β_{1}$	0.0023	0.0966	0.0093	-0.0037	0.0811	0.0066	-0.0017	0.0801	0.0064	-0.002	0.0575	0.0033
	$β_{2}$	1e-04	0.0947	0.009	2e-04	0.0808	0.0065	0	0.0791	0.0063	3e-04	0.0578	0.0033
	$β_{3}$	-0.0011	0.0913	0.0083	0.0025	0.0793	0.0063	0.0017	0.0784	0.0061	4e-04	0.0571	0.0033

Equations81

y_{i t}

y_{i t}

ε_{i t}

\displaystyle\mbox{\boldmath$y$}_{i}

\displaystyle\mbox{\boldmath$y$}_{i}

y

\mbox{\boldmath$y$}_{i}={\mathbf{X}}_{i}{\mbox{\boldmath$\beta$}}+\mbox{\boldmath$\varepsilon$}_{i},\quad\quad\quad\quad i=1,2,...,n

\mbox{\boldmath$y$}_{i}={\mathbf{X}}_{i}{\mbox{\boldmath$\beta$}}+\mbox{\boldmath$\varepsilon$}_{i},\quad\quad\quad\quad i=1,2,...,n

\mbox{\boldmath$y$}={\mathbf{X}}{\mbox{\boldmath$\beta$}}+\mbox{\boldmath$\varepsilon$}

\mbox{\boldmath$y$}={\mathbf{X}}{\mbox{\boldmath$\beta$}}+\mbox{\boldmath$\varepsilon$}

\mbox{\boldmath$\Omega$}=\left[\begin{array}[]{cccc}{\cal E}_{1}&0&\cdots&0\\ 0&{\cal E}_{2}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&{\cal E}_{n}\\ \end{array}\right]

\mbox{\boldmath$\Omega$}=\left[\begin{array}[]{cccc}{\cal E}_{1}&0&\cdots&0\\ 0&{\cal E}_{2}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&{\cal E}_{n}\\ \end{array}\right]

\int g(z_{s},z_{t})\,dH(\mbox{\boldmath$z$})=\int g(z_{s},z_{t})\,dz_{s}\,dz_{t},

\int g(z_{s},z_{t})\,dH(\mbox{\boldmath$z$})=\int g(z_{s},z_{t})\,dz_{s}\,dz_{t},

\displaystyle U_{k}(\mbox{\boldmath$z$},\mbox{\boldmath$b$})

\displaystyle U_{k}(\mbox{\boldmath$z$},\mbox{\boldmath$b$})

\displaystyle{\mathbf{U}}(\mbox{\boldmath$z$},\mbox{\boldmath$b$})

\displaystyle{\cal L}(\mbox{\boldmath$b$})

{\cal L}(\widehat{\mbox{\boldmath$\beta$}}):=\inf_{\begin{subarray}{c}\mbox{\boldmath$b$}\end{subarray}}{\cal L}(\mbox{\boldmath$b$}).

{\cal L}(\widehat{\mbox{\boldmath$\beta$}}):=\inf_{\begin{subarray}{c}\mbox{\boldmath$b$}\end{subarray}}{\cal L}(\mbox{\boldmath$b$}).

\displaystyle\mbox{\boldmath$I$}(\mbox{\boldmath$x$}\leq\mbox{\boldmath$y$})

\displaystyle\mbox{\boldmath$I$}(\mbox{\boldmath$x$}\leq\mbox{\boldmath$y$})

\displaystyle\mbox{\boldmath$I$}(\mbox{\boldmath$x$}<\mbox{\boldmath$y$})

{\cal L}(\mbox{\boldmath$b$})=\sum_{k=1}^{p}\int\left[\sum_{i=1}^{n}\mbox{\boldmath$d$}_{ik}^{\prime}\left\{\mbox{\boldmath$I$}(\mbox{\boldmath$y$}_{i}-{\mathbf{X}}_{i}\mbox{\boldmath$b$}\leq\mbox{\boldmath$z$})-\mbox{\boldmath$I$}(-\mbox{\boldmath$y$}_{i}+{\mathbf{X}}_{i}\mbox{\boldmath$b$}<\mbox{\boldmath$z$})\right\}\right]^{2}dH(\mbox{\boldmath$z$})

{\cal L}(\mbox{\boldmath$b$})=\sum_{k=1}^{p}\int\left[\sum_{i=1}^{n}\mbox{\boldmath$d$}_{ik}^{\prime}\left\{\mbox{\boldmath$I$}(\mbox{\boldmath$y$}_{i}-{\mathbf{X}}_{i}\mbox{\boldmath$b$}\leq\mbox{\boldmath$z$})-\mbox{\boldmath$I$}(-\mbox{\boldmath$y$}_{i}+{\mathbf{X}}_{i}\mbox{\boldmath$b$}<\mbox{\boldmath$z$})\right\}\right]^{2}dH(\mbox{\boldmath$z$})

{\mathbf{D}}_{i}:=\left[\begin{array}[]{ccccc}d_{i11}&\cdots&d_{i1k}&\cdots&d_{i1p}\\ \vdots&\ddots&\vdots&\ddots&\vdots\\ d_{it1}&\ddots&d_{itk}&\ddots&d_{itp}\\ \vdots&\ddots&\vdots&\ddots&\vdots\\ d_{iT1}&\cdots&d_{iTk}&\cdots&d_{iTp}\\ \end{array}\right].

{\mathbf{D}}_{i}:=\left[\begin{array}[]{ccccc}d_{i11}&\cdots&d_{i1k}&\cdots&d_{i1p}\\ \vdots&\ddots&\vdots&\ddots&\vdots\\ d_{it1}&\ddots&d_{itk}&\ddots&d_{itp}\\ \vdots&\ddots&\vdots&\ddots&\vdots\\ d_{iT1}&\cdots&d_{iTk}&\cdots&d_{iTp}\\ \end{array}\right].

n \to \infty lim sup (n 1 \leq i \leq n max d_{i t k}^{2}) < \infty.

n \to \infty lim sup (n 1 \leq i \leq n max d_{i t k}^{2}) < \infty.

\displaystyle{\cal L}(\mbox{\boldmath$b$})=4\sum_{k=1}^{p}\left[\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}(y_{it}-\mbox{\boldmath$x$}_{it}^{\prime}\mbox{\boldmath$b$})\right]^{2},

\displaystyle{\cal L}(\mbox{\boldmath$b$})=4\sum_{k=1}^{p}\left[\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}(y_{it}-\mbox{\boldmath$x$}_{it}^{\prime}\mbox{\boldmath$b$})\right]^{2},

\displaystyle\frac{\partial{\cal L}(\mbox{\boldmath$b$})}{\partial\mbox{\boldmath$b$}}

\displaystyle\frac{\partial{\cal L}(\mbox{\boldmath$b$})}{\partial\mbox{\boldmath$b$}}

\widehat{\mbox{\boldmath$\beta$}}=\left(\sum_{k=1}^{p}\left[\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}_{it}\right)\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}^{\prime}_{it}\right)\right]\right)^{-1}\left(\sum_{k=1}^{p}\left[\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}_{it}\right)\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}y_{it}\right)\right]\right).

\widehat{\mbox{\boldmath$\beta$}}=\left(\sum_{k=1}^{p}\left[\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}_{it}\right)\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}^{\prime}_{it}\right)\right]\right)^{-1}\left(\sum_{k=1}^{p}\left[\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}_{it}\right)\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}y_{it}\right)\right]\right).

\widetilde{\mbox{\boldmath$x$}}_{k}^{\prime}:=\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}_{it}^{\prime},

\widetilde{\mbox{\boldmath$x$}}_{k}^{\prime}:=\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}_{it}^{\prime},

X = i = 1 \sum n D_{i}^{'} X_{i} = D^{'} X .

X = i = 1 \sum n D_{i}^{'} X_{i} = D^{'} X .

\displaystyle\sum_{k=1}^{p}\left[\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}_{it}\right)\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}^{\prime}_{it}\right)\right]

\displaystyle\sum_{k=1}^{p}\left[\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}_{it}\right)\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}^{\prime}_{it}\right)\right]

y_{k} := i = 1 \sum n t = 1 \sum T d_{i t k} y_{i t}, ε_{k} := i = 1 \sum n t = 1 \sum T d_{i t k} ε_{i t}, 1 \leq k \leq p .

y_{k} := i = 1 \sum n t = 1 \sum T d_{i t k} y_{i t}, ε_{k} := i = 1 \sum n t = 1 \sum T d_{i t k} ε_{i t}, 1 \leq k \leq p .

\displaystyle\sum_{k=1}^{p}\left[\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}_{it}\right)\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}y_{it}\right)\right]

\displaystyle\sum_{k=1}^{p}\left[\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}\mbox{\boldmath$x$}_{it}\right)\left(\sum_{i=1}^{n}\sum_{t=1}^{T}d_{itk}y_{it}\right)\right]

\widehat{\mbox{\boldmath$\beta$}}=(\widetilde{{\mathbf{X}}}^{\prime}\widetilde{{\mathbf{X}}})^{-1}(\widetilde{{\mathbf{X}}}^{\prime}\widetilde{\mbox{\boldmath$y$}})=({\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}})^{-1}({\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}\mbox{\boldmath$y$}).

\widehat{\mbox{\boldmath$\beta$}}=(\widetilde{{\mathbf{X}}}^{\prime}\widetilde{{\mathbf{X}}})^{-1}(\widetilde{{\mathbf{X}}}^{\prime}\widetilde{\mbox{\boldmath$y$}})=({\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}})^{-1}({\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}\mbox{\boldmath$y$}).

\widehat{\mbox{\boldmath$\beta$}}=\mbox{\boldmath$\beta$}+(\widetilde{{\mathbf{X}}}^{\prime}\widetilde{{\mathbf{X}}})^{-1}(\widetilde{{\mathbf{X}}}^{\prime}\widetilde{\mbox{\boldmath$\varepsilon$}}),

\widehat{\mbox{\boldmath$\beta$}}=\mbox{\boldmath$\beta$}+(\widetilde{{\mathbf{X}}}^{\prime}\widetilde{{\mathbf{X}}})^{-1}(\widetilde{{\mathbf{X}}}^{\prime}\widetilde{\mbox{\boldmath$\varepsilon$}}),

\mbox{\boldmath$\Sigma_{\beta}$}=({\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}})^{-1}\mbox{\boldmath$\Sigma$}_{{\mathbf{X}}{\mathbf{D}}\Omega}({\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}})^{-1}

\mbox{\boldmath$\Sigma_{\beta}$}=({\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}})^{-1}\mbox{\boldmath$\Sigma$}_{{\mathbf{X}}{\mathbf{D}}\Omega}({\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}})^{-1}

\mbox{\boldmath$\Sigma$}_{{\mathbf{X}}{\mathbf{D}}\Omega}={\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}\mbox{\boldmath$\Omega$}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}}.

\mbox{\boldmath$\Sigma$}_{{\mathbf{X}}{\mathbf{D}}\Omega}={\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}\mbox{\boldmath$\Omega$}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}}.

\mbox{\boldmath$\Omega$}=\mathbf{Q}\mbox{\boldmath$\Lambda$}\mathbf{Q}^{\prime}

\mbox{\boldmath$\Omega$}=\mathbf{Q}\mbox{\boldmath$\Lambda$}\mathbf{Q}^{\prime}

\mbox{\boldmath$d$}_{j}=c_{j}^{-1/2}\mbox{\boldmath$q$}_{j}.

\mbox{\boldmath$d$}_{j}=c_{j}^{-1/2}\mbox{\boldmath$q$}_{j}.

\mbox{\boldmath$\Sigma_{\beta}$}=({\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}})^{-1}.

\mbox{\boldmath$\Sigma_{\beta}$}=({\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}})^{-1}.

tr({\mathbf{X}}\mbox{\boldmath$\Sigma$}_{{\mathbf{X}}{\mathbf{D}}\Omega}^{-1}{\mathbf{X}}^{\prime})\leq C<\infty,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpatial and Panel Data Analysis · Global trade and economics · Statistical Methods and Inference

Full text

The Minimum Distance Estimation with Multiple Integral

in Panel Data

Jiwoong Kim

University of Notre Dame

Abstract

This paper studies the minimum distance estimation problem for panel data model. We propose the minimum distance estimators of regression parameters of the panel data model and investigate their asymptotic distributions. This paper contains two main contributions. First, the domain of application of the minimum distance estimation method is extended to the panel data model. Second, the proposed estimators are more efficient than other existing ones. Simulation studies compare performance of the proposed estimators with performance of others and demonstrate some superiority of our estimators.

Keywords: Minimum distance estimation; panel data

1 Introduction

Panel data refers to a data set which includes multiple observations of entities (or cross-section units) over time. A classical assumption on the linear regression model with panel data — called panel regression model hereafter — is that errors in the model can be decomposed into two components: time-invariant individual effect and remainder-disturbance which varies with time and entities. These two components are assumed to be independent. The errors in the panel regression model are dependent for the same entity over time while the errors of different entities are independent regardless of time. When observations are expressed in vector form, the panel regression model resembles the regression model with independent errors — refer to (2.4) — even though it is not. Treating the panel regression model as if the errors in the model are completely independent and applying the ordinary least squares (OLS) method to obtain regression parameters will yield estimators with higher variances. To redress this issue, various well-celebrated estimators — e.g. within estimator and random effector estimator — have been proposed; see Wooldridge (2007) for the detail. Kim (2016) applied the minimum distance (MD) estimation method to the panel regression and compared the MD estimators with the above-mentioned estimators; he demonstrated superiority of the MD estimators to other estimators. Even though the MD estimation method seems desirable, it has a weakness which makes it difficult to implement and hence has been subject to criticism. Common criticism placed on the MD estimation method is that the MD estimation method does not provide a closed-form solution; only numerical solution to the MD estimator is available. In addition, computation of the numerical solution is also slow due to the complexity of the objective function — which is called distance — used in the MD estimation method. Kim (2017) proposed a fast algorithm for the MD estimation method and published R-package with which a practitioner can easily compute the MD estimator. He showed computation time is extremely reduced when his algorithm is employed for the MD estimation problem. However, the MD estimation method is still computationally expensive compared with other methods such as the OLS. In this paper, the author proposes a variant of the MD estimation method which provides a closed-form solution to the estimator. As shown later, the proposed MD method resembles OLS method to some extent; the proposed estimator will inherit advantages of the MD and the OLS estimator. In other words, it will retain the efficiency of the MD estimator and be as fast as the OLS estimator in terms of computation.

This paper is organized as follows. Section 2 introduces the panel data model of the interest; the distance — employed in the MD estimation method — is also defined. In Section 3, the asymptotic distribution of the proposed estimator is investigated. In Section 4,simulation studies compare the proposed estimator with other estimators.

2 The distance function

Consider the panel regression model

[TABLE]

where $\mbox{\boldmath$ x $}_{it}=(x_{it}^{1},...,x_{it}^{p})^{\prime}\in{\mathbb{R}}^{p}$ are non random design variables, ${\mbox{\boldmath$ \beta $}}=(\beta_{1},...,\beta_{p})^{\prime}\in{\mathbb{R}}^{p}$ is the parameter vector of interest, and $\varepsilon_{it}$ are errors. As a classical assumption, the error term is decomposed into time-invariant $\gamma_{i}$ and $\nu_{it}$ which varies with time and cross-section. Define

[TABLE]

Note that the model (2.1) can be expressed as

[TABLE]

and

[TABLE]

in vector and matrix forms, respectively. The errors in the model are assumed to be dependent for the same cross-section but independent over cross-sections, i.e., for all $1\leq s,t\leq T$ , $E(\varepsilon_{it}\varepsilon_{js})\neq 0$ only if $i=j$ . Let ${\cal E}_{i}$ and $\Omega$ denote the covariance matrices of $\mbox{\boldmath$ \varepsilon $}_{i}$ and $\varepsilon$ , respectively. Then, we have

[TABLE]

Next, we define an integrating measure which will be used in the distance function. Let $g:{\mathbb{R}}^{2}\rightarrow{\mathbb{R}}$ denote a real function. For real vector $\mbox{\boldmath$ z $}=(z_{1},...,z_{T})\in{\mathbb{R}}^{T}$ , define

[TABLE]

where $1\leq s,\,t\leq T$ . Define the distance function for any $d_{itk}\in{\mathbb{R}}$ with $1\leq i\leq n$ , $1\leq t\leq T$ , and $1\leq k\leq p$

[TABLE]

where $H$ is as in (2.6). Subsequently, define the MD estimator $\widehat{\mbox{\boldmath$ \beta $}}$ as

[TABLE]

*Remark 2.1**.*

Consider real vectors $\mbox{\boldmath$ x $}=(x_{1},...,x_{T})^{\prime}\in{\mathbb{R}}^{T}$ and $\mbox{\boldmath$ y $}=(y_{1},...,y_{T})^{\prime}\in{\mathbb{R}}^{T}$ . Let

[TABLE]

Let $\mbox{\boldmath$ d $}_{ik}:=(d_{i1k},...,d_{iTk})^{\prime}\in{\mathbb{R}}^{T}$ for $1\leq i\leq n$ and $1\leq k\leq p$ . Then ${\cal L}$ above can be rewritten as

[TABLE]

which is an analogue of the distance function in Koul (2002).

3 Asymptotic distribution of $\widehat{\mbox{\boldmath$ \beta $}}$

In this section we derive the asymptotic distribution of $\widehat{\mbox{\boldmath$ \beta $}}$ under the current setup. To begin with, define a $T\times p$ real matrix ${\mathbf{D}}_{i}$ the $(t,k)$ th entry of which is $d_{itk}$ in (2.7) with $1\leq i\leq n$ , $1\leq t\leq T$ and $1\leq k\leq p$ , i.e.,

[TABLE]

Next, stack all ${\mathbf{D}}_{i}$ ’s and obtain a $nT\times p$ real matrix which is denoted by ${\mathbf{D}}$ . To proceed further, the following assumptions are required.

(A.1)

$\{\mbox{\boldmath$ \varepsilon $}_{i}\}_{i=1}^{n}$ are independent and identically distributed with $E\|\mbox{\boldmath$ \varepsilon $}_{1}\|<\infty$ .

(A.2)

For all $1\leq t\leq T$ and $1\leq k\leq p$ ,

[TABLE]

(A.3)

The matrix ${\mathbf{X}}^{\prime}{\mathbf{D}}{\mathbf{D}}^{\prime}{\mathbf{X}}$ is nonsingular.

Note that

[TABLE]

and hence

[TABLE]

Therefore,

[TABLE]

Let $\widetilde{{\mathbf{X}}}$ denote a $p\times p$ matrix whose $k$ th row vector is

[TABLE]

which implies

[TABLE]

Thus

[TABLE]

Let $\widetilde{\mbox{\boldmath$ y $}}:=(\widetilde{y}_{1},...,\widetilde{y}_{p})^{\prime}\in{\mathbb{R}}^{p}$ and $\widetilde{\mbox{\boldmath$ \varepsilon $}}:=(\widetilde{\varepsilon}_{1},...,\widetilde{\varepsilon}_{p})^{\prime}\in{\mathbb{R}}^{p}$ where

[TABLE]

Then we have

[TABLE]

and hence, $\widehat{\mbox{\boldmath$ \beta $}}$ can be written in matrix form:

[TABLE]

*Remark 3.1**.*

Assume that $({\mathbf{X}}^{\prime}{\mathbf{X}})$ is nonsingular with ${\mathbf{A}}:=(({\mathbf{X}}^{\prime}{\mathbf{X}}))^{-1/2}$ . Consider ${\mathbf{D}}={\mathbf{X}}{\mathbf{A}}$ . Then ${\mathbf{D}}{\mathbf{D}}^{\prime}={\mathbf{X}}({\mathbf{X}}^{\prime}{\mathbf{X}})^{-1}{\mathbf{X}}^{\prime}$ , and hence, $\widehat{\mbox{\boldmath$ \beta $}}$ is reduced to the OLS estimator.

*Remark 3.2**.*

Let $\Sigma_{\beta}$ denote the covariance matrix of $\widehat{\mbox{\boldmath$ \beta $}}$ . Note that

[TABLE]

and hence, $\widehat{\mbox{\boldmath$ \beta $}}$ is unbiased. Consequently,

[TABLE]

where

[TABLE]

*Remark 3.3**.*

Since $\Omega$ is a positive-definite symmetric matrix, it can be written as

[TABLE]

where $\Lambda$ is a diagonal matrix whose diagonal entries are eigenvalues of $\Omega$ and $\mathbf{Q}$ is a orthonormal matrix whose columns are eigenvectors of $\Omega$ . Let $c_{i}$ and $\mbox{\boldmath$ q $}_{i}$ for $1\leq i\leq n$ denote its $i$ th eigenvalue and eigenvector, respectively. Let $\mbox{\boldmath$ d $}_{j}$ for $1\leq j\leq p$ denote $j$ th column vector of ${\mathbf{D}}$ and

[TABLE]

Then ${\mathbf{D}}^{\prime}\mbox{\boldmath$ \Omega $}{\mathbf{D}}=\mathbf{I}_{p\times p}$ and hence

[TABLE]

Now we are ready to state the main result.

Theorem 3.1.

Assume $\Sigma_{\mbox{\boldmath$ \beta $}}$ is positive definite. In addition, assume that

[TABLE]

where tr( $\cdot$ ) is a trace function. Then

[TABLE]

where $\textbf{I}_{p\times p}$ is the $p\times p$ identity matrix.

Proof. To prove (3.2), it suffices to show that for any $\mbox{\boldmath$ \lambda $}\in{\mathbb{R}}^{p}$ , $\mbox{\boldmath$ \lambda $}^{\prime}\mbox{\boldmath$ \Sigma $}_{\beta}^{-1/2}(\widehat{\mbox{\boldmath$ \beta $}}-\mbox{\boldmath$ \beta $})$ is asymptotically normally distributed. Let $\mbox{\boldmath$ \lambda $}_{X}:=\widetilde{{\mathbf{X}}}(\widetilde{{\mathbf{X}}}^{\prime}\widetilde{{\mathbf{X}}})^{-1}\mbox{\boldmath$ \Sigma $}_{\beta}^{-1/2}\mbox{\boldmath$ \lambda $}$ . Rewrite

[TABLE]

where $\zeta_{i}=\mbox{\boldmath$ \lambda $}_{X}^{\prime}{\mathbf{D}}_{i}^{\prime}\mbox{\boldmath$ \varepsilon $}_{i}$ . Note that $\{\zeta_{1},\zeta_{2},...,\zeta_{n}\}$ is a sequence of independent random variables. Also,

[TABLE]

Observe that with $\|\mbox{\boldmath$ \lambda $}\|=1$

[TABLE]

where the first equality follows from the definition of the Frobenius norm, the second equality follows from the multiplicative property of the trace function, and last inequality follows from (3.1). Therefore,

[TABLE]

Observe that (A.1) and the dominance convergence theorem imply for all $\epsilon>0$

[TABLE]

where $C_{1},C_{2}<\infty$ . Consequently, (3.2) follows after the direct application of Lindeberg central limit theorem. ∎

4 Simulation Studies

4.1 Other panel data estimators

In this section, we briefly introduce other estimators of panel regression parameters commonly used in the literature of econometrics. For more details of these estimators, see Hsiao (2003) and Wooldridge (2007); this section has roots in their work. Consider within model

[TABLE]

where $\bar{y}_{i}=T^{-1}\sum_{t=1}^{T}y_{it}$ , $\bar{\mbox{\boldmath$ x $}}_{i}=T^{-1}\sum_{t=1}^{T}\mbox{\boldmath$ x $}_{it}$ , and $\bar{\varepsilon}_{i}=T^{-1}\sum_{t=1}^{T}\varepsilon_{it}$ . The within estimator is the ordinary least squares (OLS) estimator obtained from the within model. Note that the time-invariant individual effect $\gamma_{i}$ does not exist in the within model after the average of the error is subtracted from the original error. Another well celebrated panel data estimator is random effect estimator. The random effect estimator is a variant of the feasible generalized least squares estimator; it can be obtained by applying the OLS estimation to the following model

[TABLE]

where $\widehat{\rho}$ is consistent for $\rho:=1-\sigma_{\nu}^{2}/\sqrt{\sigma_{\nu}^{2}+T\sigma_{\gamma}^{2}}$ . Note that the OLS and within estimator are special cases of the random effect estimators corresponding to $\widehat{\rho}=0$ and $\widehat{\rho}=1$ , respectively. In order to obtain the MD estimators in the next section, we apply the MD method to the within model so that the individual effect can be removed.

4.2 Comparison with other estimators

In this section we present simulation studies corresponding to sixteen pairs of symmetric individual effects and remainder disturbances. Both individual effects ( $\gamma_{i}$ ) and remainder disturbances ( $\nu_{it}$ ) are generated from normal, Laplace, logistic, and mixture of two normal (MTN) distributions. The random variable has Laplace or logistic distribution if its density function is

[TABLE]

or

[TABLE]

respectively. When we generate $\gamma_{i}$ or $\nu_{it}$ from either Laplace or logistic distribution, we set $\mu_{1}=\mu_{2}=0$ and $\sigma_{1}=\sigma_{2}=5$ . For normal $\gamma_{i}$ or $\nu_{it}$ , we use $N(0,5^{2})$ ; for MTN, we obtain them from $0.9N(0,2^{2})+0.1N(0,5^{2})$ . For each $1\leq t\leq T$ , $1\leq i\leq n$ , and $1\leq j\leq p$ , we obtain $x_{it}^{j}$ in (2.1) from the uniform distribution on (0,30); we set $\mbox{\boldmath$ \beta $}=(-2,1.2,3.3)^{\prime}$ . Finally, we generate $\{y_{it}:1\leq i\leq n,\,\,1\leq t\leq T\}$ by using the model (2.1).

Dhar (1991, 1992) demonstrated the existence of the MD estimators and discussed an algorithm to obtain them in the linear regression model with independent errors. However, his algorithm employs brute-force search method which is computationally expensive. Kim (2017) proposed a fast algorithm with R-package KoulMde which enables practitioners to easily compute the MD estimators; it is available from Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/web/packages/KoulMde/index.html. To obtain the MD estimators in this simulation study, we use KoulMde. The bias, standard error (SE), and means squared error (MSE) of the MD and other estimators — which are introduced in the previous section — of $\beta$ are reported below; for easy comparison purpose, we analyze the findings and evaluate the performance of estimators in terms of MSE since those with the least MSE also approximately display the least bias or SE or both.

Table 1-(a) reports findings corresponding to the normal individual effect with normal, Laplace, logistic, and MTN remainder disturbances when $n=10$ and $T=5$ . As reported in the table, the within estimators outperform other estimators regardless of remainder disturbances; the random effect estimators display almost the same performance as the within estimators. The MD estimators follows the within and the random effect estimators, and, not surprisingly, the OLS estimators are the worst.

Table 1-(b), 1-(c), and 1-(d) report the findings corresponding to non-Gaussian individual effects: logistic, Laplace, and MTN. Similar to cases of independent non-Gaussian errors of the linear regression model which are illustrated in Koul (2002), the MD estimators display the least SE — as a result, the least MSE — regardless of the remainder disturbances. It is, however, hard to discuss the merits and demerits in terms of bias. For the normal and Laplace disturbances, the within estimators generally show the least bias regardless of individual effects; in the case of logistic and MTN disturbances, the MD estimators generally display the least bias. None of estimators shows dominance over others in terms of bias. One notable fact is that the superiority of the MD estimators to others is prominent especially in the case of MTN individual effect. Observe that MSE’s of the MD estimators corresponding to normal and MTN remainder disturbances are approximately $50\%$ of those of the within and RE estimators. When $n$ and $T$ are increased, we obtain the similar results to the case of $n=10$ and $T=5$ , and hence, we do not report here.

Bibliography10

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Baltagi, B. H., 2001. Econometric Analysis of Panel Data, second ed. John Wiley & Sons.
2[2] Dhar, S. K., 1991. Minimum distance estimation in an additive effects outliers model. Ann. Statist. 19, 205-228.
3[3] Dhar, S. K., 1992. Computation of certain minimum L 2 subscript 𝐿 2 L_{2} -distance type estimators under the linear model. Comm. Statist. Simulation Comput.21, 203-220.
4[4] Hsiao, C., 2003. Analysis of Panel Data, second ed. Cambridge University Press.
5[5] Kim, J., 2017. preprint ar Xiv:1702.02707.
6[6] Koul, H. L., 1985. Minimum distance estimation in multiple linear regression with unknown error distributions. Statist. Probab. Lett. 3, 1-8.
7[7] Koul, H. L., 2002. Weighted empirical process in nonlinear dynamic models. Springer, Berlin, Vol. 166.
8[8] Koul, H. L. and De Wet, T., 1983. Minimum distance estimation in a linear regression model. Ann. Statist. 11, 921-932.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Abstract

1 Introduction

2 The distance function

Remark 2.1*.*

3 Asymptotic distribution of \widehat{\mbox{\boldmath\beta}}

Remark 3.1*.*

Remark 3.2*.*

Remark 3.3*.*

Theorem 3.1**.**

4 Simulation Studies

4.1 Other panel data estimators

4.2 Comparison with other estimators

*Remark 2.1**.*

3 Asymptotic distribution of $\widehat{\mbox{\boldmath$ \beta $}}$

*Remark 3.1**.*

*Remark 3.2**.*

*Remark 3.3**.*

Theorem 3.1.