Variable dispersion beta regressions with parametric link functions

Diego Ramos Canterle; F\'abio Mariano Bayer

arXiv:1702.00327·stat.ME·February 2, 2017

Variable dispersion beta regressions with parametric link functions

Diego Ramos Canterle, F\'abio Mariano Bayer

PDF

TL;DR

This paper introduces a new class of beta regression models with parametric link functions for continuous data in (0,1), allowing flexible modeling of mean and dispersion with covariates and detailed inference procedures.

Contribution

It proposes a novel regression framework with parametric link functions for beta-distributed data, including estimation, inference, diagnostics, and practical application.

Findings

01

Link functions include symmetric and asymmetric Aranda-Ordaz types.

02

Maximum likelihood estimation jointly estimates regression and link parameters.

03

Simulation study shows good finite sample performance.

Abstract

This paper presents a new class of regression models for continuous data restricted to the interval $(0, 1)$ , such as rates and proportions. The proposed class of models assumes a beta distribution for the variable of interest with regression structures for the mean and dispersion parameters. These structures consider covariates, unknown regression parameters, and parametric link functions. Link functions depend on parameters that model the relationship between the random component and the linear predictors. The symmetric and assymetric Aranda-Ordaz link functions are considered in details. Depending on the parameter values, these link functions refer to particular cases of fixed links such as logit and complementary log-log functions. Joint estimation of the regression and link function parameters is performed by maximum likelihood. Closed-form expressions for the score function and…

Tables4

Table 1. Table 1: Monte Carlo simulation results of point estimation evaluation for symmetric Aranda-Ordaz link functions.

Scenario 1
	$β_{0}$	$β_{1}$	$β_{2}$	$γ_{0}$	$γ_{1}$	$γ_{2}$	$λ_{1}$	$λ_{2}$
parameters	$1.500$	$- 1.000$	$- 1.500$	$- 1.700$	$1.000$	$- 2.000$	$0.500$	$0.500$
$n = 100$
mean	$1.502$	$- 1.001$	$- 1.504$	$- 1.786$	$0.780$	$- 1.126$	$0.301$	$0.688$
bias	$0.002$	$- 0.001$	$- 0.004$	$- 0.086$	$- 0.220$	$0.874$	$- 0.199$	$0.188$
RB	$0.151$	$0.059$	$0.251$	$5.012$	$- 22.036$	$- 43.721$	$- 39.765$	$37.686$
SD	$0.102$	$0.185$	$0.105$	$0.190$	$0.331$	$0.553$	$0.308$	$0.206$
MSE	$0.010$	$0.034$	$0.011$	$0.043$	$0.158$	$1.070$	$0.135$	$0.078$
$n = 500$
mean	$1.508$	$- 1.006$	$- 1.509$	$- 1.769$	$0.871$	$- 1.376$	$0.411$	$0.626$
bias	$0.008$	$- 0.006$	$- 0.009$	$- 0.069$	$- 0.129$	$0.626$	$- 0.089$	$0.126$
RB	$0.509$	$0.557$	$0.612$	$4.048$	$- 12.851$	$- 31.308$	$- 17.859$	$25.229$
SD	$0.018$	$0.017$	$0.022$	$0.146$	$0.291$	$0.623$	$0.161$	$0.219$
MSE	$0.000$	$0.000$	$0.001$	$0.026$	$0.101$	$0.780$	$0.034$	$0.064$
Scenario 2
	$β_{0}$	$β_{1}$	$β_{2}$	$γ_{0}$	$γ_{1}$	$γ_{2}$	$λ_{1}$	$λ_{2}$
parameters	$1.500$	$- 2.000$	$1.000$	$- 2.000$	$1.000$	$- 1.000$	$0.250$	$0.850$
$n = 100$
mean	$1.507$	$- 2.008$	$1.000$	$- 2.264$	$0.699$	$- 0.713$	$0.200$	$0.614$
bias	$0.008$	$- 0.008$	$- 0.000$	$- 0.264$	$- 0.301$	$0.287$	$- 0.050$	$- 0.237$
RB	$0.472$	$0.389$	$- 0.027$	$13.190$	$- 30.084$	$- 28.740$	$- 19.878$	$- 27.750$
SD	$0.053$	$0.058$	$0.048$	$0.325$	$0.494$	$0.552$	$0.125$	$0.235$
MSE	$0.003$	$0.003$	$0.002$	$0.175$	$0.334$	$0.387$	$0.018$	$0.111$
$n = 500$
mean	$1.503$	$- 2.004$	$1.002$	$- 2.337$	$0.880$	$- 0.876$	$0.224$	$0.612$
bias	$0.003$	$- 0.004$	$0.002$	$- 0.337$	$- 0.120$	$0.124$	$- 0.025$	$- 0.238$
RB	$0.171$	$0.219$	$0.231$	$16.859$	$- 11.964$	$- 12.367$	$- 10.191$	$- 28.007$
SD	$0.015$	$0.026$	$0.018$	$0.268$	$0.496$	$0.630$	$0.079$	$0.214$
MSE	$0.000$	$0.001$	$0.000$	$0.185$	$0.261$	$0.412$	$0.007$	$0.102$

Table 2. Table 2: Monte Carlo simulation results of point estimation evaluation for asymmetric Aranda-Ordaz link functions.

Scenario 1
	$β_{0}$	$β_{1}$	$β_{2}$	$γ_{0}$	$γ_{1}$	$γ_{2}$	$λ_{1}$	$λ_{2}$
parameters	$1.000$	$6.000$	$- 4.000$	$- 1.000$	$- 5.000$	$3.000$	$5.000$	$10.000$
$n = 100$
mean	$1.019$	$6.029$	$- 4.017$	$0.415$	$- 7.122$	$4.056$	$5.025$	$20.697$
bias	$0.019$	$0.029$	$- 0.017$	$1.415$	$- 2.122$	$1.056$	$0.025$	$10.697$
RB	$1.090$	$0.478$	$0.434$	$- 141.521$	$42.444$	$35.207$	$0.509$	$106.971$
SD	$0.159$	$0.415$	$0.258$	$5.459$	$7.111$	$3.392$	$0.423$	$36.193$
MSE	$0.0256$	$0.173$	$0.067$	$31.805$	$55.068$	$12.618$	$0.180$	$1424.330$
$n = 500$
mean	$1.007$	$6.002$	$- 4.001$	$- 0.918$	$- 5.166$	$3.103$	$5.001$	$10.829$
bias	$0.001$	$0.002$	$- 0.001$	$0.082$	$- 0.166$	$0.103$	$0.001$	$0.829$
RB	$0.069$	$0.025$	$0.025$	$- 8.150$	$3.330$	$3.437$	$0.029$	$8.290$
SD	$0.046$	$0.108$	$0.072$	$0.299$	$0.442$	$0.279$	$0.112$	$2.632$
MSE	$0.002$	$0.012$	$0.005$	$0.096$	$0.223$	$0.089$	$0.012$	$7.616$
Scenario 2
	$β_{0}$	$β_{1}$	$β_{2}$	$γ_{0}$	$γ_{1}$	$γ_{2}$	$λ_{1}$	$λ_{2}$
parameters	$1.000$	$3.000$	$- 4.000$	$- 1.000$	$- 8.000$	$1.000$	$1.000$	$1.000$
$n = 100$
mean	$1.000$	$3.000$	$- 4.000$	$- 0.863$	$- 8.301$	$1.032$	$1.000$	$2.449$
bias	$- 0.000$	$- 0.000$	$0.000$	$0.137$	$- 0.301$	$0.032$	$- 0.000$	$1.449$
RB	$- 0.001$	$- 0.000$	$- 0.000$	$- 13.685$	$3.766$	$3.241$	$- 0.001$	$144.888$
SD	$0.002$	$0.003$	$0.003$	$0.336$	$0.491$	$0.295$	$0.002$	$2.931$
MSE	$0.000$	$0.000$	$0.000$	$0.136$	$0.332$	$0.088$	$0.000$	$10.692$
$n = 500$
mean	$1.000$	$3.000$	$- 4.000$	$- 0.976$	$- 8.059$	$1.010$	$1.000$	$1.270$
bias	$- 0.000$	$- 0.000$	$0.000$	$0.024$	$- 0.059$	$0.010$	$- 0.000$	$0.270$
RB	$- 0.001$	$- 0.000$	$- 0.000$	$- 2.425$	$0.736$	$0.962$	$- 0.001$	$26.986859$
SD	$0.001$	$0.001$	$0.001$	$0.127$	$0.185$	$0.120$	$0.001$	$1.016$
MSE	$0.000$	$0.000$	$0.000$	$0.017$	$0.038$	$0.015$	$0.000$	$1.105$

Table 3. Table 3: Fitted model for religious belief data.

Mean submodel
	Estimate	Std. error	$z$ stat	$p -value$
Intercept	$25.183$	$7.041$	$3.576$	$0.000$
$I Q$	$- 0.881$	$0.190$	$4.623$	$0.000$
$I Q^{2}$	$0.006$	$0.001$	$4.861$	$0.000$
$I N C O M E$	$0.029$	$0.017$	$1.690$	$0.091$
$M U S L$	$- 0.761$	$0.142$	$5.354$	$0.000$
$\log O P E N$	$0.481$	$0.162$	$2.967$	$0.003$
$λ_{1}$	$9.255$	$3.892$
Dispersion submodel
Intercept	$- 8.817$	$1.354$	$6.510$	$0.000$
$I Q$	$0.059$	$0.011$	$5.250$	$0.000$
$M U S L$	$- 1.608$	$0.256$	$6.281$	$0.000$
$\log O P E N$	$0.548$	$0.213$	$2.580$	$0.010$
$M \times I$	$0.118$	$0.036$	$3.308$	$0.001$
$λ_{2}$	$0.853$	$1.605$
$R_{G}^{2} = 0.841$
$AIC = - 560.271$ .

Table 4. Table 4: A comparison between the proposed fitted model for religious belief data and the model in Cribari-Neto and Souza ( 2013 ) .

Model	$R_{G}^{2}$	$ℓ (\hat{𝜽})$	AIC	MSE( $y$ , $\hat{μ}$ )
Model with fixed links	$0.760$	$267.489$	$- 518.979$	$0.015$
(Cribari-Neto and Souza, 2013)	$0.760$	$267.489$	$- 518.979$	$0.015$
Model with parametric links	$0.841$	$293.135$	$- 560.271$	$0.013$
(proposed)	$0.841$	$293.135$	$- 560.271$	$0.013$

Equations157

f (y; μ, σ)

f (y; μ, σ)

g_{1} (μ_{t}, λ_{1}) = i = 1 \sum r x_{t i} β_{i} = η_{1 t},

g_{1} (μ_{t}, λ_{1}) = i = 1 \sum r x_{t i} β_{i} = η_{1 t},

g_{2} (σ_{t}, λ_{2}) = j = 1 \sum s z_{t j} γ_{j} = η_{2 t},

μ_{t} = g_{1}^{- 1} (η_{1 t}, λ_{1}),

μ_{t} = g_{1}^{- 1} (η_{1 t}, λ_{1}),

σ_{t} = g_{2}^{- 1} (η_{2 t}, λ_{2}) .

G = {g (\cdot, λ) : λ \in Λ} .

G = {g (\cdot, λ) : λ \in Λ} .

ℓ (θ) = t = 1 \sum n ℓ_{t} (μ_{t}, σ_{t}),

ℓ (θ) = t = 1 \sum n ℓ_{t} (μ_{t}, σ_{t}),

ℓ_{t} (μ_{t}, σ_{t})

ℓ_{t} (μ_{t}, σ_{t})

+ (μ_{t} \frac{1 - σ _{t}^{2}}{σ _{t}^{2}} - 1) lo g y_{t} + ((1 - μ_{t}) \frac{1 - σ _{t}^{2}}{σ _{t}^{2}} - 1) lo g (1 - y_{t}),

U_{β} (θ) = X^{⊤} Σ T (y^{*} - μ^{*}),

U_{β} (θ) = X^{⊤} Σ T (y^{*} - μ^{*}),

U_{γ} (θ) = Z^{⊤} H a,

U_{γ} (θ) = Z^{⊤} H a,

\displaystyle a_{t}=-\dfrac{2}{\sigma^{3}_{t}}\bigg{[}\mu_{t}(y_{t}^{*}-\mu_{t}^{*})+\psi\left(\frac{1-\sigma_{t}^{2}}{\sigma_{t}^{2}}\right)-\psi\left((1-\mu_{t})\frac{1-\sigma_{t}^{2}}{\sigma_{t}^{2}}\right)+\log(1-y_{t})\bigg{]}.

\displaystyle a_{t}=-\dfrac{2}{\sigma^{3}_{t}}\bigg{[}\mu_{t}(y_{t}^{*}-\mu_{t}^{*})+\psi\left(\frac{1-\sigma_{t}^{2}}{\sigma_{t}^{2}}\right)-\psi\left((1-\mu_{t})\frac{1-\sigma_{t}^{2}}{\sigma_{t}^{2}}\right)+\log(1-y_{t})\bigg{]}.

U_{λ_{1}} (θ)

U_{λ_{1}} (θ)

U_{λ_{2}} (θ)

\displaystyle\left\{\begin{array}[]{ll}U_{\bm{\beta}}(\bm{\theta})&=0\\ U_{\bm{\gamma}}(\bm{\theta})&=0\\ U_{\lambda_{1}}(\bm{\theta})&=0\\ U_{\lambda_{2}}(\bm{\theta})&=0\end{array}\right..

\displaystyle\left\{\begin{array}[]{ll}U_{\bm{\beta}}(\bm{\theta})&=0\\ U_{\bm{\gamma}}(\bm{\theta})&=0\\ U_{\lambda_{1}}(\bm{\theta})&=0\\ U_{\lambda_{2}}(\bm{\theta})&=0\end{array}\right..

K = K (θ) = K_{(β, β)} K_{(γ, β)} K_{(λ_{1}, β)} K_{(λ_{2}, β)} K_{(β, γ)} K_{(γ, γ)} K_{(λ_{1}, γ)} K_{(λ_{2}, γ)} K_{(β, λ_{1})} K_{(γ, λ_{1})} K_{(λ_{1}, λ_{1})} K_{(λ_{2}, λ_{1})} K_{(β, λ_{2})} K_{(γ, λ_{2})} K_{(λ_{1}, λ_{2})} K_{(λ_{2}, λ_{2})},

K = K (θ) = K_{(β, β)} K_{(γ, β)} K_{(λ_{1}, β)} K_{(λ_{2}, β)} K_{(β, γ)} K_{(γ, γ)} K_{(λ_{1}, γ)} K_{(λ_{2}, γ)} K_{(β, λ_{1})} K_{(γ, λ_{1})} K_{(λ_{1}, λ_{1})} K_{(λ_{2}, λ_{1})} K_{(β, λ_{2})} K_{(γ, λ_{2})} K_{(λ_{1}, λ_{2})} K_{(λ_{2}, λ_{2})},

w_{t}

w_{t}

c_{t}

ν_{t}

d_{t}^{*}

\displaystyle\left(\begin{array}[]{llll}\widehat{\bm{\beta}}\\ \widehat{\bm{\gamma}}\\ \widehat{\lambda}_{1}\\ \widehat{\lambda}_{2}\end{array}\right)\sim\mathcal{N}_{q}\left(\begin{array}[]{llll}\left(\begin{array}[]{llll}\bm{\beta}\\ \bm{\gamma}\\ \lambda_{1}\\ \lambda_{2}\end{array}\right),\bm{K}^{-1}\end{array}\right),

\displaystyle\left(\begin{array}[]{llll}\widehat{\bm{\beta}}\\ \widehat{\bm{\gamma}}\\ \widehat{\lambda}_{1}\\ \widehat{\lambda}_{2}\end{array}\right)\sim\mathcal{N}_{q}\left(\begin{array}[]{llll}\left(\begin{array}[]{llll}\bm{\beta}\\ \bm{\gamma}\\ \lambda_{1}\\ \lambda_{2}\end{array}\right),\bm{K}^{-1}\end{array}\right),

[θ_{m} - Φ^{- 1} (1 - α /2) se (θ_{m}); θ_{m} + Φ^{- 1} (1 - α /2) se (θ_{m})],

[θ_{m} - Φ^{- 1} (1 - α /2) se (θ_{m}); θ_{m} + Φ^{- 1} (1 - α /2) se (θ_{m})],

[g_{δ}^{- 1} (η_{δ t} - Φ^{- 1} (1 - α /2) se (η_{δ t}), λ_{δ}); g_{δ}^{- 1} (η_{δ t} + Φ^{- 1} (1 - α /2) se (η_{δ t}), λ_{δ})],

[g_{δ}^{- 1} (η_{δ t} - Φ^{- 1} (1 - α /2) se (η_{δ t}), λ_{δ}); g_{δ}^{- 1} (η_{δ t} + Φ^{- 1} (1 - α /2) se (η_{δ t}), λ_{δ})],

z = \frac{θ _{m} - θ _{m}^{0}}{se ( θ _{m} )} .

z = \frac{θ _{m} - θ _{m}^{0}}{se ( θ _{m} )} .

r_{t} = \frac{y _{t} - μ _{t}}{Var ( Y _{t} )},

r_{t} = \frac{y _{t} - μ _{t}}{Var ( Y _{t} )},

r_{t}^{pp} = \frac{y _{t}^{*} - μ _{t}^{*}}{Var ( y _{t}^{*} ) ( 1 - h _{tt} )},

r_{t}^{pp} = \frac{y _{t}^{*} - μ _{t}^{*}}{Var ( y _{t}^{*} ) ( 1 - h _{tt} )},

C_{t} = \frac{h _{tt}}{1 - h _{tt}} (r_{t}^{pp})^{2} .

C_{t} = \frac{h _{tt}}{1 - h _{tt}} (r_{t}^{pp})^{2} .

GAIC = - 2 ℓ (θ) + P q,

GAIC = - 2 ℓ (θ) + P q,

R_{G}^{2} = 1 - (\frac{L _{n u l l}}{L _{f i t}})^{(2/ n)} = 1 - exp (- \frac{2}{n} [ℓ (θ) - ℓ (0)]),

R_{G}^{2} = 1 - (\frac{L _{n u l l}}{L _{f i t}})^{(2/ n)} = 1 - exp (- \frac{2}{n} [ℓ (θ) - ℓ (0)]),

η = g (μ, λ) = \frac{2 ( μ ^{λ} - ( 1 - μ ) ^{λ} )}{λ ( μ ^{λ} + ( 1 - μ ) ^{λ} )},

η = g (μ, λ) = \frac{2 ( μ ^{λ} - ( 1 - μ ) ^{λ} )}{λ ( μ ^{λ} + ( 1 - μ ) ^{λ} )},

μ = g^{- 1} (η, λ) = \frac{( \frac{λ η}{2} + 1 ) ^{\frac{1}{λ}}}{( 1 - \frac{λ η}{2} ) ^{\frac{1}{λ}} + ( \frac{λ η}{2} + 1 ) ^{\frac{1}{λ}}} .

μ = g^{- 1} (η, λ) = \frac{( \frac{λ η}{2} + 1 ) ^{\frac{1}{λ}}}{( 1 - \frac{λ η}{2} ) ^{\frac{1}{λ}} + ( \frac{λ η}{2} + 1 ) ^{\frac{1}{λ}}} .

\frac{\partial g _{1} ( μ _{t} , λ _{1} )}{\partial μ _{t}} = \frac{4 ( μ _{t} ( 1 - μ _{t} ) ) ^{λ_{1} - 1}}{( μ _{t}^{λ_{1}} + ( 1 - μ _{t} ) ^{λ_{1}} ) ^{2}},

\frac{\partial g _{1} ( μ _{t} , λ _{1} )}{\partial μ _{t}} = \frac{4 ( μ _{t} ( 1 - μ _{t} ) ) ^{λ_{1} - 1}}{( μ _{t}^{λ_{1}} + ( 1 - μ _{t} ) ^{λ_{1}} ) ^{2}},

\frac{\partial g _{2} ( σ _{t} , λ _{2} )}{\partial σ _{t}} = \frac{4 ( σ _{t} ( 1 - σ _{t} ) ) ^{λ_{2} - 1}}{( σ _{t}^{λ_{2}} + ( 1 - σ _{t} ) ^{λ_{2}} ) ^{2}},

\frac{\partial μ _{t}}{\partial η _{1 t}} = \frac{4 ( 4 - λ _{1}^{2} η _{1 t}^{2} ) ^{\frac{1}{λ _{1}} - 1}}{( ( 2 - λ _{1} η _{1 t} ) ^{\frac{1}{λ _{1}}} + ( λ _{1} η _{1 t} + 2 ) ^{\frac{1}{λ _{1}}} ) ^{2}},

\frac{\partial σ _{t}}{\partial η _{2 t}} = \frac{4 ( 4 - λ _{2}^{2} η _{2 t}^{2} ) ^{\frac{1}{λ _{2}} - 1}}{( ( 2 - λ _{2} η _{2 t} ) ^{\frac{1}{λ _{2}}} + ( λ _{2} η _{2 t} + 2 ) ^{\frac{1}{λ _{2}}} ) ^{2}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Variable dispersion beta regressions with parametric link functions

Diego Ramos Canterle Bacharelado em Estatística and LACESM, Universidade Federal de Santa Maria, Santa Maria, RS, Brazil, e-mail: [email protected]

Fábio Mariano Bayer Departamento de Estatística and LACESM, Universidade Federal de Santa Maria, Santa Maria, RS, Brazil, e-mail: [email protected]

Abstract

This paper presents a new class of regression models for continuous data restricted to the interval $(0,1)$ , such as rates and proportions. The proposed class of models assumes a beta distribution for the variable of interest with regression structures for the mean and dispersion parameters. These structures consider covariates, unknown regression parameters, and parametric link functions. Link functions depend on parameters that model the relationship between the random component and the linear predictors. The symmetric and assymetric Aranda-Ordaz link functions are considered in details. Depending on the parameter values, these link functions refer to particular cases of fixed links such as logit and complementary log-log functions. Joint estimation of the regression and link function parameters is performed by maximum likelihood. Closed-form expressions for the score function and Fisher’s information matrix are presented. Aspects of large sample inferences are discussed, and some diagnostic measures are proposed. A Monte Carlo simulation study is used to evaluate the finite sample performance of point estimators. Finally, a practical application that employs real data is presented and discussed.

Keywords: Aranda-Ordaz link function, maximum likelihood estimator, parametric link functions, variable dispersion beta regression.

Mathematics Subject Classification (2000): MSC 62J99, MSC 62-07.

1 Introduction

The beta regression model introduced by Ferrari and Cribari-Neto (2004) has broad practicality for modeling variables belonging to the continuous interval $(0,1)$ . In this model, it is assumed that the dependent variable $Y$ has a beta distribution, where the mean of $Y$ is modeled by a regression structure involving unknown parameters, covariates, and a link function. An extension of this model is the beta regression with varying dispersion, which has been discussed by Paolino (2001), Smithson and Verkuilen (2006), Simas et al (2010), Ferrari and Pinheiro (2011) and Bayer and Cribari-Neto (2017). In this broader model, the dispersion parameter of $Y$ is modeled by a regression structure in the same way as the conditional mean. The manner in which the dispersion parameter is modeled has direct implications on the efficiency of the estimators of the mean regression structure parameters (Smyth and Verbyla, 1999; Bayer and Cribari-Neto, 2017). In addition to improving the inferences about the mean structure parameters, many applications are directly interested in modeling the dispersion to identify the sources of data variability (Smyth and Verbyla, 1999).

In the variable dispersion beta regression model, the relationship between the mean and dispersion parameters of the random component $Y$ and its linear predictors are established through link functions. In this model, considering the beta density parameterization with mean $\mu\in(0,1)$ and dispersion $\sigma\in(0,1)$ , as in Cribari-Neto and Souza (2012) and Bayer and Cribari-Neto (2017), it is possible to use link functions $g(\cdot)$ , such that $g(x):(0,1)\rightarrow\mathbb{R}$ . Typical fixed link functions in these cases include the logit, probit, log-log (loglog), complementary log-log (cloglog), and Cauchy functions (Koenker and Yoon, 2009). The fact that the possible values of $\mu$ and $\sigma$ belong to the same standard unit interval $(0,1)$ means that these link functions can be considered for both the mean and the dispersion structure.

In practice, in addition to the selection of important covariates in the mean and dispersion regression structures, as broadly discussed by Zhao et al (2014) and Bayer and Cribari-Neto (2017), the correct specification of the link functions deserves special attention. An incorrect specification of these functions may distort the inferences of the model parameters (McCullagh and Nelder, 1989, Pag. 401) leading to misinterpretations and errors in the model predictions. To circumvent the problem of selecting an appropriate link function, a parametric link function can be considered (Guerrero and Johnson, 1982; Scallan et al, 1984; Stukel, 1988; Czado, 1994; Kaiser, 1997; Smith, 2003; Czado and Raftery, 2006; Koenker and Yoon, 2009; Adewale and Xu, 2010; Ramalho et al, 2011; Gomes and Ludermir, 2013; Dehbi et al, 2014; Taneichi et al, 2014; Geraci and Jones, 2015; Dehbi et al, 2016). Such functions involve an unknown parameter that must be estimated. In general, depending on the value of this parameter, some known link functions arise as special cases. The link functions proposed by Aranda-Ordaz (Aranda-Ordaz, 1981) are the parametric type most widely used in cases where the parameters of interest lie in the interval $(0,1)$ . Special cases of the Aranda-Ordaz link functions include the logit and cloglog functions.

Some regression models with parametric link functions have been described in the literature. Guerrero and Johnson (1982) used a transformation of the Box-Cox link function in binary response models. Scallan et al (1984) proposed generalized linear models (GLM) (McCullagh and Nelder, 1989) with general parametric link functions by presenting certain estimation aspects and identifying some special cases. Stukel (1988) adjusted the binary response models to consider a two-parameter link function. Czado (1994) developed a two-parameter link function that modifies the two tails of the function. Kaiser (1997) considered the likelihood inferences of link function parameters in GLM. Czado and Raftery (2006) chose the link function in GLM using Bayes factors. Koenker and Yoon (2009) studied the selection of the link function in binary data using the parametric link functions of Gosset and Pregibon. Quantile regression with Aranda-Ordaz link function is considered by Dehbi et al (2016). According to Czado (1997), the maximum likelihood fit in GLM is improved by using parametric link functions in place of canonical link functions.

Regarding the beta regression model, some problems associated with the correct specification of the link function have been investigated. Oliveira (2013) evaluated the performance of the RESET test by checking the misspecification of the link function in the beta regression model, and Pereira and Cribari-Neto (2013) evaluated the RESET test in the inflated beta regression model. Andrade (2007) generalized the seminal model proposed by Ferrari and Cribari-Neto (2004) by considering the Aranda-Ordaz link function for the regression structure of the mean; however, this approach still considered constant dispersion. Nevertheless, there is a lack of studies focusing on the specification of the link function in the dispersion submodel.

Based on the above discussion, we propose a generalization of the variable dispersion beta regression model, considering parametric link functions for the structures of both $\mu$ and $\sigma$ . The parametric estimators of the link functions for the mean and dispersion submodels are proposed together with other parameters for the regression structures. The estimation of these parameters is performed using maximum likelihood estimation. Diagnostic measures and tools for model selection are also proposed.

This paper unfolds as follows. Section 2 presents the beta regression model with parametric link functions. In Section 3, we discuss all aspects of maximum likelihood estimation. Section 4 introduces some diagnostic measures to check the goodness-of-fit in the resulting model. Section 5 presents two special cases of the parametric link functions based on the symmetric and asymmetric Aranda-Ordaz (Aranda-Ordaz, 1981) families of link functions. The finite sample performance of the estimators is assessed in Section 6. Section 7 presents and discusses an application to real data on religious disbelief. Our concluding remarks are given in Section 8.

2 The model

The beta regression model proposed by Ferrari and Cribari-Neto (2004) considers a constant precision parameter $\phi$ throughout the observations. Nevertheless, by erroneously assuming a constant $\phi$ , the losses in efficiency for the estimators can be substantial, as discussed by Bayer and Cribari-Neto (2017). In beta regression with varying dispersion, the precision parameter is assumed to be variable throughout the observations and modeled by covariates, unknown parameters, and one link function, in the same way as the mean.

In this work, as in that reported by Cribari-Neto and Souza (2012) and Bayer and Cribari-Neto (2017), a beta density reparameterization is considered. Rather than focusing on the precision parameter $\phi$ , a dispersion parameter $\sigma$ is considered. With such parameterization, the beta density is written as follows:

[TABLE]

where $0<\mu<1$ , $0<\sigma<1$ , and $\Gamma(u)=\int_{0}^{\infty}t^{u-1}e^{-t}\rm{d}t$ is the gamma function, for $u>0$ . The two parameters indexing the density assume values in the standard unit interval $(0,1)$ , which enables the same link function to be used in the two regression structures. The expectation and variance of $Y$ are given by $\mathbb{E}(Y)=\mu$ and ${\rm Var}(Y)=V(\mu)\sigma^{2}$ , respectively, where $V(\mu)=\mu(1-\mu)$ is the variance function. However, the proposed model is still useful for response variable restricted to the double bounded interval $(a,b)$ , where $a$ and $b$ are known scalars, $a<b$ . In this case, we would model $(Y-a)/(b-a)$ instead of modeling $Y$ directly (Ferrari and Cribari-Neto, 2004; Smithson and Verkuilen, 2006; Zimprich, 2010).

Let $Y_{1},\ldots,Y_{n}$ be independent random variables, where each $Y_{t}$ , $t=1,\ldots,n$ , has a density given by (1) with mean $\mu_{t}$ and dispersion $\sigma_{t}$ . The variable dispersion beta regression model with parametric link functions is defined by

[TABLE]

where $\bm{\beta}=(\beta_{1},\ldots,\beta_{r})^{\top}\in\mathbb{R}^{r}$ and $\bm{\gamma}=(\gamma_{1},\ldots,\gamma_{s})^{\top}\in\mathbb{R}^{s}$ are the vectors of unknown regression parameters ( $r+s+2=q<n$ ), $\bm{x}^{\top}_{t}=(x_{t1},\ldots,x_{tr})$ and $\bm{z}^{\top}_{t}=(z_{t1},\ldots,z_{ts})$ represent the $t$ th observations of the explanatory variables, which are assumed to be fixed and known, and $\eta_{1t}=\bm{x}^{\top}_{t}\bm{\beta}$ and $\eta_{2t}=\bm{z}^{\top}_{t}\bm{\gamma}$ are the linear predictors for the mean and dispersion, respectively. Finally, $g_{1}(\cdot,\cdot)$ and $g_{2}(\cdot,\cdot)$ are strictly monotonic in the first argument and twice differentiable in both arguments, such that $g_{\delta}:(0,1)\rightarrow\mathbb{R}$ , for $\delta=1,2$ . The second arguments of $g_{\delta}(\cdot,\cdot)$ , $\lambda_{1}\in\Lambda_{1}$ and $\lambda_{2}\in\Lambda_{2}$ , are the link function parameters. Further, note that

[TABLE]

The parameters $\lambda_{1}$ and $\lambda_{2}$ are shape parameters that generally influence the symmetry and heaviness of tails of the fitted curves for $\mu$ and $\sigma$ (Stukel, 1988).

Unlike models that consider fixed link functions, the proposed model captures different relationships between the linear predictors $\eta_{\delta t}$ , $\delta=1,2$ , and their respective parameters $\mu_{t}$ and $\sigma_{t}$ . Depending on the parametric value $\lambda$ for a given function $g(\cdot,\lambda)$ , there is a particular family of link functions given by

[TABLE]

Different link function families can be considered. When the parameters of interest are in the continuous interval $(0,1)$ , such as $\mu_{t}$ and $\sigma_{t}$ in the proposed model, possibilities include the symmetric and asymmetric link functions proposed by Aranda-Ordaz (1981), Box-Cox transformation link function (Guerrero and Johnson, 1982), Gosset link function (Koenker and Yoon, 2009), Pregibon link function (Pregibon, 1980), and generalized logit function considered by Ramalho et al (2011). In particular, the Pregibon link function has two parameters, and is not contextualized in this work. Gosset link function fails to consider the possible asymmetric relationship between the random component and the linear predictors. In this regard, and in addition to the overall results of any one-parametric links, this work presents results for the symmetric and asymmetric Aranda-Ordaz link functions.

3 Likelihood inference

The maximum likelihood estimation of the parametric vector $\bm{\theta}=(\bm{\beta}^{\top}\!,\bm{\gamma}^{\top}\!,\lambda_{1},\lambda_{2})^{\top}$ is given by maximizing the logarithm of the likelihood function. Given a sample size $n$ and considering the form of the density in (1), the log-likelihood is given by

[TABLE]

where

[TABLE]

in which $\mu_{t}$ and $\sigma_{t}$ are given by the regression structures in (2) and (3), respectively.

By deriving the log-likelihood function in (4) with respect to the parametric vector $\bm{\theta}$ , we obtain the score vector $U(\bm{\theta})=\left(U_{\bm{\beta}}(\bm{\theta})^{\top},U_{\bm{\gamma}}(\bm{\theta})^{\top},U_{\lambda_{1}}(\bm{\theta}),U_{\lambda_{2}}(\bm{\theta})\right)^{\top}$ . Details of the analytical derivations are given in detail in the Appendix. The score function with respect to $\beta$ is given by

[TABLE]

where $\bm{X}$ is the $n\times r$ matrix in which the $t$ th row is $\bm{x}_{t}$ , $\bm{\Sigma}\!=\!{\rm diag}\!\left(\!\frac{1-\sigma_{1}^{2}}{\sigma_{1}^{2}},\ldots,\!\frac{1-\sigma_{n}^{2}}{\sigma_{n}^{2}}\!\right)$ , $\bm{T}={\rm diag}\bigg{(}\left[\frac{\partial g_{1}(\mu_{1},\lambda_{1})}{\partial\mu_{1}}\right]^{-1},$ $\ldots,\left[\frac{\partial g_{1}(\mu_{n},\lambda_{1})}{\partial\mu_{n}}\right]^{-1}\bigg{)}$ , $\bm{y}^{*}=(y^{*}_{1},\ldots,y^{*}_{n})^{\top}$ , $\bm{\mu}^{*}=(\mu^{*}_{1},\ldots,\mu^{*}_{n})^{\top}$ , with $y_{t}^{*}=\log(y_{t}/(1-y_{t}))$ , $\mu_{t}^{*}=\psi\left(\mu_{t}\frac{1-\sigma_{t}^{2}}{\sigma_{t}^{2}}\right)-\psi\left((1-\mu_{t})\frac{1-\sigma_{t}^{2}}{\sigma_{t}^{2}}\right)$ , and $\psi(\cdot)$ is the digamma function, i.e., $\psi(u)=\frac{d\log\Gamma(u)}{du}$ .

The score function with respect to $\bm{\gamma}$ is given by

[TABLE]

where $\bm{Z}$ is the $n\times s$ matrix whose $t$ th row is $\bm{z}_{t}$ , $\bm{H}={\rm diag}\left(\left[\frac{\partial g_{2}(\sigma_{1},\lambda_{2})}{\partial\sigma_{1}}\right]^{-1},\ldots,\left[\frac{\partial g_{2}(\sigma_{n},\lambda_{2})}{\partial\sigma_{n}}\right]^{-1}\right)$ , $\bm{a}=(a_{1},\ldots,a_{n})^{\top}$ , with

[TABLE]

The score functions with respect to $\lambda_{1}$ and $\lambda_{2}$ are given by

[TABLE]

respectively, where $\rho_{t}=\dfrac{\partial\mu_{t}}{\partial\lambda_{1}}$ depends on the parametric link function to be used in the mean submodel and $\varrho_{t}=\dfrac{\partial\sigma_{t}}{\partial\lambda_{2}}$ depends on the link function considered in the dispersion submodel. In Section 5, the quantities $\rho_{t}=\dfrac{\partial\mu_{t}}{\partial\lambda_{1}}$ and $\varrho_{t}=\dfrac{\partial\sigma_{t}}{\partial\lambda_{2}}$ are presented for the symmetric and asymmetric Aranda-Ordaz link functions.

The maximum likelihood estimators (MLEs) for the beta regression model with parametric link functions are obtained by solving the following nonlinear system:

[TABLE]

Solving Equation (9) requires the use of nonlinear optimization algorithms. In this work, the quasi-Newton BFGS method (Press et al, 1992) was used for the computational implementations.

Fisher’s information matrix, which is useful for large sample inferences, requires the expectations of the second derivatives of the log-likelihood function. Details of the analytical derivation of these quantities are given in the Appendix. The joint information matrix for the parametric vector $\bm{\theta}$ is given by

[TABLE]

where $K_{(\beta,\beta)}=\bm{X}^{\top}\bm{\Sigma}\bm{W}\bm{X}$ , $K_{(\beta,\gamma)}=K_{(\gamma,\beta)}^{\top}=\bm{X}^{\top}\bm{C}\bm{T}\bm{H}\bm{Z}$ , $K_{(\beta,\lambda_{1})}=K_{(\lambda_{1},\beta)}^{\top}=\bm{X}^{\top}\bm{V}\bm{T}\bm{\rho}$ , $K_{(\beta,\lambda_{2})}=K_{(\lambda_{2},\beta)}^{\top}=\bm{X}^{\top}\bm{C}\bm{T}\bm{\varrho}$ , $K_{(\gamma,\gamma)}=\bm{Z}^{\top}\bm{D}^{*}\bm{H}\bm{H}^{\top}\bm{Z}$ , $K_{(\gamma,\lambda_{1})}=K_{(\lambda_{1},\gamma)}^{\top}=\bm{Z}^{\top}\bm{C}\bm{H}\bm{\rho}$ , $K_{(\gamma,\lambda_{2})}=K_{(\lambda_{2},\gamma)}^{\top}=\bm{Z}^{\top}\bm{D}^{*}\bm{H}\bm{\varrho}$ , $K_{(\lambda_{1},\lambda_{1})}=\bm{\rho}^{\top}\bm{V}\bm{\rho}$ , $K_{(\lambda_{1},\lambda_{2})}=K_{(\lambda_{2},\lambda_{1})}^{\top}=\bm{\rho}^{\top}\bm{C}\bm{\varrho}$ , and $K_{(\lambda_{2},\lambda_{2})}=\bm{\varrho}^{\top}\bm{D}^{*}\bm{\varrho}$ , with $\bm{\rho}=(\rho_{1},\ldots,\rho_{n})^{\top}$ , $\bm{\varrho}=(\varrho_{1},\ldots,\varrho_{n})^{\top}$ , $\bm{W}={\rm diag}(w_{1},\ldots,w_{n})$ , $\bm{C}={\rm diag}(c_{1},\ldots,c_{n})$ , $\bm{V}={\rm diag}(\nu_{1},\ldots,\nu_{n})$ , and $\bm{D}^{*}={\rm diag}(d^{*}_{1},\ldots,$

$d^{*}_{n})$ . Finally,

[TABLE]

where $\psi^{\prime}(\cdot)$ is the trigamma function, i.e., $\psi^{\prime}(u)=\frac{d\psi(u)}{du}$ , for $u>0$ . According to the concept of orthogonality by Cox and Reid (1987), (10) can be used to ascertain that the model parameters are not orthogonal because the information matrix is not a diagonal block matrix.

3.1 Large sample inference

Under the usual regularity conditions for MLE (Pawitan, 2001), the joint distribution of the MLEs is approximately $q$ -multivariate normal when the sample size is large, i.e.,

[TABLE]

where $\widehat{\bm{\beta}}$ , $\widehat{\bm{\gamma}}$ , $\widehat{\lambda}_{1}$ , and $\widehat{\lambda}_{2}$ are the MLEs of $\bm{\beta}$ , $\bm{\gamma}$ , $\lambda_{1}$ , and $\lambda_{2}$ , respectively, and $\bm{K}^{-1}$ is the inverse Fisher’s information matrix.

The Wald confidence intervals model parameters $\theta_{m}$ , $m=1,\ldots,q$ , are defined by (Pawitan, 2001; Ferrari and Cribari-Neto, 2004):

[TABLE]

where $\widehat{\theta}_{m}$ represents the MLE of $\theta_{m}$ , the standard error of $\widehat{\theta}_{m}$ is given by $\widehat{{\rm se}}(\widehat{\theta}_{m})=[{\rm diag}({\rm\widehat{cov}}(\widehat{\bm{\theta}}))]_{m}^{1/2}$ , in which ${\rm\widehat{cov}}(\widehat{\bm{\theta}})=\bm{K}^{-1}(\widehat{\bm{\theta}})$ is the asymptotic variance and covariance matrix of $\widehat{\bm{\theta}}$ , $\Phi^{-1}$ is the quantile function of the standard normal distribution, and $\alpha$ is the nominal level of the confidence interval. Similar to Ferrari and Cribari-Neto (2004), for $\mu_{t}$ and $\sigma_{t}$ , for $\delta=1,2$ respectively, we have the following confidence intervals:

[TABLE]

where the standard errors of $\widehat{\eta}_{\delta t}$ , for $\delta=1,2$ , are estimated by $\widehat{{\rm se}}(\widehat{\eta}_{1t})=(x_{t}\widehat{{\rm cov}}(\widehat{\beta})x_{t}^{\top})^{1/2}$ and $\widehat{{\rm se}}(\widehat{\eta}_{2t})=(z_{t}\widehat{{\rm cov}}(\widehat{\gamma})z_{t}^{\top})^{1/2}$ .

To test the hypotheses on the parameters, we consider the null hypothesis $\mathcal{H}_{0}:\theta_{m}=\theta_{m}^{0}$ versus $\mathcal{H}_{1}:\theta_{m}\neq\theta_{m}^{0}$ . The Wald test can be considered by using the following statistic (Pawitan, 2001):

[TABLE]

Because the $z$ statistic has an asymptotically standard normal distribution under $\mathcal{H}_{0}$ , the test is performed by comparing the calculated $z$ statistic with the usual quantiles of the standard normal distribution.

For more general hypotheses, $\mathcal{H}_{0}:\bm{\theta}_{I}=\bm{\theta}_{I}^{0}$ versus $\mathcal{H}_{1}:\bm{\theta}_{I}\neq\bm{\theta}_{I}^{0}$ , where $\bm{\theta}=(\bm{\theta}_{I}^{\top},\bm{\theta}_{N}^{\top})^{\top}$ has dimension $q$ , $\bm{\theta}_{I}$ is the vector of parameters of interest with dimension $\iota$ , and $\bm{\theta}_{N}$ is the vector of nuisance parameters with dimension $q-\iota$ , four test statistics can be considered, namely: the likelihood ratio (LR) (Neyman and Pearson, 1928), Wald (W) (Wald, 1943), score (S) (Rao, 1948), and gradient (G) (Terrell, 2002). Under $\mathcal{H}_{0}$ and the usual conditions of regularity, the four test statistics have the asymptotic chi-squared distribution with $\iota$ degrees of freedom ( $\chi_{\iota}^{2}$ ), where $\iota$ is the number of restrictions imposed by the null hypothesis (Vargas et al, 2014). The test can be performed by comparing the calculated value of the statistic considered, i.e., LR, W, S, or G, with the usual quantile of $\chi_{\iota}^{2}$ .

4 Diagnostics

After estimating the model, it is necessary to evaluate possible departures from the model assumptions, as well as the detection of unadjusted or aberrant points. This section introduces some diagnostic measures to determine the correct adjustment of the proposed model.

Residuals are an important measure in checking for deviations from the unknown population model, disparate observations, and adjustment quality. Initially, the standardized ordinary residual is proposed. This is given by

[TABLE]

where ${\rm\widehat{Var}}(Y_{t})=\widehat{\mu}_{t}(1-\widehat{\mu}_{t})\widehat{\sigma}_{t}^{2}$ . Additionally, the standardized weighted residual 2 can be used, as proposed by Ferrari et al (2011) for the varying dispersion beta regression model. This is given by

[TABLE]

where ${\rm\widehat{Var}}(y_{t}^{*})=\psi^{\prime}\left(\widehat{\mu}_{t}\frac{1-\widehat{\sigma}_{t}^{2}}{\widehat{\sigma}_{t}^{2}}\right)-\psi^{\prime}\left((1-\widehat{\mu}_{t})\frac{1-\widehat{\sigma}_{t}^{2}}{\widehat{\sigma}_{t}^{2}}\right)$ , and $h_{tt}$ is the $t$ th diagonal element of the ‘hat matrix’ $\mathbf{H}=(\widehat{\bm{W}}\widehat{\bm{\Sigma}})^{1/2}\bm{X}(\bm{X}^{\top}\widehat{\bm{\Sigma}}\widehat{\bm{W}}\bm{X})^{-1}\bm{X}^{\top}(\widehat{\bm{\Sigma}}\widehat{\bm{W}})^{1/2}$ . This residual provides an improved approximation of the standard normal distribution when the model is correctly adjusted and when a model with fixed links is considered (Espinheira et al, 2008a). In prior simulations and analyses, the performance of the $r_{t}^{pp}$ residuals was found to be good in the proposed model considering parametric links. A residual chart is typically used to analyze the residuals against their respective indices. In this chart, the residuals are expected to be randomly distributed around zero, and no more than $5\%$ of the values can occur outside of the $[-2,2]$ interval.

To verify that the distribution assumed for the dependent variable is adequate, we can examine half-normal plots with simulated envelopes by evaluating the quality of the fitted model (Atkinson, 1981). The simulated envelope can be built as follows (Atkinson, 1985; Ferrari and Cribari-Neto, 2004):

(i)

fit the model and generate a simulated sample set of $n$ independent observations using the fitted model as if it were the true model; 2. (ii)

fit the model from the generated sample, calculate the absolute values of the residuals and arrange them in order; 3. (iii)

repeat steps (i) and (ii) $k$ times; 4. (iv)

consider the $n$ sets of the $k$ order statistics; for each set, calculate the quantile $\alpha/2$ , the mean, and the quantile $1-\alpha/2$ ; 5. (v)

plot these values and the ordered residuals of the original sample set against the $\Phi^{-1}((t+n+1/2)/(2n+10/8))$ scores.

No more than $\alpha\times 100\%$ of the observations are expected to occur outside the envelope bands. A very large proportion of points lying outside the bands suggests that the model is inadequate.

The overall influence measures of each observation under the estimates of the model parameters can be considered using Cook’s distance (Cook, 1977). In this study, we use the Cook-like distance proposed by Espinheira et al (2008b) for the beta regression model. This distance combines leverage measures and the model residuals, and is defined by

[TABLE]

To check for possible points of influence, it is common to produce a chart of $C_{t}$ against their respective $t$ indices.

Candidate models can be selected using information criteria, such as the generalized Akaike information criterion (GAIC) (Akaike, 1983; Rigby and Stasinopoulos, 2005), which is given by

[TABLE]

where $\mathcal{P}$ can take different real values. Values of $\mathcal{P}=2$ and $\mathcal{P}={\rm log}(n)$ , give the Akaike information criterion (AIC) (Akaike, 1974) and the Schwarz information criterion (SIC) (Schwarz, 1978), respectively. These criteria take into account the maximized log-likelihood penalized by the number of parameters in the adjusted model. For the selection of competitive models, that with the lowest GAIC value should be chosen.

To ascertain the correct model specification, the RESET tests (Ramsey, 1969) are recommended. McCullagh and Nelder (1989) suggested using a RESET-type test in GLM, whereas Pereira and Cribari-Neto (2013) and Oliveira (2013) argued they are suitable for the beta regressions. To run the RESET-type test for the proposed model, $\widehat{\bm{\eta}}_{1}^{2}$ should be added as a covariate in both the mean and dispersion submodels. This new model should be fitted with $\lambda_{1}$ and $\lambda_{2}$ fixed to their previously estimated values. The parameters of the artificial covariates $\widehat{\bm{\eta}}_{1}^{2}$ should then be tested according to the $\mathcal{H}_{0}:(\bm{\beta}_{r+1},\bm{\gamma}_{s+1})=(0,0)$ null hypothesis, where $\bm{\beta}_{r+1}$ and $\bm{\gamma}_{s+1}$ are the parameters pertaining to the artificial covariates in the mean and dispersion submodels, respectively. If $\mathcal{H}_{0}$ is not rejected, the model is specified correctly; otherwise, the model is specified incorrectly. To run the RESET-type test, any one of the four test statistics cited in Subsection 3.1 can be used.

We can use the LR, W, S, and G statistics to test the incorrect specification of some fixed link function. Considering the asymmetric Aranda-Ordaz link function, we can test $H_{0}:(\lambda_{1},\lambda_{2})=(1,1)$ to check whether the logit link function for mean and dispersion submodels is appropriate. If $\mathcal{H}_{0}$ is not rejected, the fixed logit links are correctly specified.

As a global measure of the goodness-of-fit, we consider the generalized coefficient of determination (Nagelkerke, 1991). This is given by

[TABLE]

where $\ell(0)$ is the maximized log-likelihood of the null model, i.e., under constant mean and dispersion111When constant mean and dispersion are considered, no regression structures are considered; thus, there are no estimates for $\lambda_{\delta}$ ., $\ell(\widehat{\bm{\theta}})$ is the maximized log-likelihood of the fitted model, $\ell(0)={\rm log}L_{null}$ , and $\ell(\widehat{\bm{\theta}})={\rm log}L_{fit}$ . $R^{2}_{G}$ measures the proportion of the variability of $Y$ that can be explained by the fitted model; this lies in the interval $[0,1]$ . A higher value of $R^{2}_{G}$ implies that the model predictions are more accurate.

5 Aranda-Ordaz link functions - two particular cases

As mentioned earlier in this paper, the Aranda-Ordaz link function families (Aranda-Ordaz, 1981) can be used to relate the mean and dispersion parameters with their respective linear predictors. We considered these link functions because they are two one-parameter families of symmetric and asymmetric links that includes several well-known links as particular cases (Dehbi et al, 2016). They can be also considered in several works in a multitude of regression models (Morgan, 1992; Colosimo et al, 2000; Smith, 2003; Adewale and Xu, 2010; Gomes and Ludermir, 2013; Dehbi et al, 2014; Taneichi et al, 2014; Geraci and Jones, 2015; Dehbi et al, 2016). Because the two parameters $\mu$ and $\sigma$ of the proposed model assume values in the same interval $(0,1)$ , the relationships established immediately below are valid for both of these parameters.

The symmetric Aranda-Ordaz link function is given by:

[TABLE]

where $\lambda\neq 0$ and $\mu\in(0,1)$ . The symmetry refers to the fact that ${\rm g}(\mu,\lambda)=-{\rm g}(1-\mu,\lambda)$ and ${\rm g}(\mu,\lambda)={\rm g}(\mu,-\lambda)$ (Dehbi et al, 2016). This link function family reduces to the linear link function if $\lambda=1$ , to the logit if $\lambda\rightarrow 0$ , close to the probit link if $\lambda=0.39$ , and close to the arc sine link function if $\lambda=0.67$ (Aranda-Ordaz, 1981; Dehbi et al, 2016). Figure 1(a) shows some different forms of the symmetric Aranda-Ordaz link function considering different values of the link function parameter $\lambda$ . For this symmetric link function, the inverse function can be written as follows:

[TABLE]

In the general formulation of the proposed model presented in Section 3, the score vector and Fisher’s information matrix involve the quantities $\left(\frac{\partial g_{1}(\mu_{t},\lambda_{1})}{\partial\mu_{t}}\right)^{-1}$ , $\left(\frac{\partial g_{2}(\sigma_{t},\lambda_{2})}{\partial\sigma_{t}}\right)^{-1}$ , $\bm{\rho}$ , and $\bm{\varrho}$ , which depend on the considered parametric link functions. Considering the symmetric Aranda-Ordaz link function in both regression structures, we have:

[TABLE]

and

[TABLE]

The asymmetric Aranda-Ordaz link function is given by (Aranda-Ordaz, 1981):

[TABLE]

where $\lambda>-1/e^{\eta}$ , $\mu\in(0,1)$ , and its inverse can be written as follows:

[TABLE]

The asymmetric Aranda-Ordaz function is more flexible than the symmetric version and it captures the possible asymmetry between the linear predictors and the parameters $\mu$ and $\sigma$ . In Figure 1(b), this relationship can be seen for different values of the parameter $\lambda$ . The logit and cloglog link functions are special cases for $\lambda=1$ and $\lambda\rightarrow 0$ , respectively. Compared with the usual logit function, $\mu$ or $\sigma$ tends to 1 more quickly as $\eta_{\delta}$ increases when $\lambda<1$ ; and for $\lambda>1$ , the parameters $\mu$ or $\sigma$ tends more slowly to 1 as $\eta_{\delta}$ increases. It is notable that a link function with a lower parameter value results in a greater variation in $\mu$ and/or $\sigma$ in relation to $\eta_{\delta}$ . In contrast, very high values for the link function parameter might indicate that the parameters $\mu$ and/or $\sigma$ are not variable and should be estimated without independent variables, i.e., as constants.

Considering the asymmetric Aranda-Ordaz link function the quantities needed for score vector and Fisher’s information matrix are given by:

[TABLE]

and

[TABLE]

From these quantities, we can obtain the score vector and Fisher’s information matrix given in Section 3. These quantities assume that $\mu$ depends on $\lambda_{1}$ and $\sigma$ depends on $\lambda_{2}$ .

6 Numerical evaluation

To assess the finite sample performance of the point estimators, this section provides a numerical evaluation using Monte Carlo simulations. This assessment considers the mean, bias, relative bias (RB), standard deviation (SD), and mean squared error (MSE) of the point estimates. We used $R=50,000$ Monte Carlo replications in each scenario, and considered sample sizes of $n=100$ and $n=500$ . For each Monte Carlo replication, $n$ instances of the random variable $Y_{t}$ were generated with the density function in (1), where the mean and dispersion parameters are given by $\mu_{t}=g_{1}^{-1}(\eta_{1t},\lambda_{1})$ and $\sigma_{t}=g_{2}^{-1}(\eta_{2t},\lambda_{2})$ , respectively. As discussed in Section 5, we considered two families of Aranda-Ordaz link functions, namely: symmetric and asymmetric. The values of $\bm{\beta}$ , $\bm{\gamma}$ , $\lambda_{1}$ , and $\lambda_{2}$ are listed in Tables 1 and 2, respectively, along with the numerical results.

The covariates for the mean and dispersion submodels were generated from the uniform distribution $(0,1)$ , and were considered to be constant for all Monte Carlo replications. Computational implementations were conducted using the R language (R Development Core Team, 2014). An R function for fitting the proposed model with asymmetric Aranda-Ordaz link function, along with the diagnostic measures, is available at http://www.ufsm.br/bayer/betareglink.zip.

In general, according to Tables 1 and 2, the parameter estimates related to the mean submodel are not biased, unlike those for the dispersion submodel. This bias in the dispersion parameter estimators has been verified in other variations of the beta regression model, that consider fixed links (Ospina et al, 2006; Simas et al, 2010; Ospina and Ferrari, 2012). Considering symmetric family of Aranda-Ordaz link function, Table 1 shows that the estimators for the dispersion submodel parameters are more biased than when we consider the asymmetric family (Table 2). We also note that the estimator of $\lambda_{2}$ was biased even in moderate sample sizes. This results can be justified by numerical problems in the log-likelihood maximization. The symmetric Aranda-Ordaz link function is numerically more unstable than the asymmetric one, due to fact that it fails to be differentiable at some points for some values of $\lambda$ (cf. Figure 1(a)).

For results about asymmetric family in Table 2, it can be observed that the bias in the dispersion structure estimators is concentrated at the intercept and for higher values of $\lambda_{2}$ . The estimator of the link function parameter in the dispersion submodel also produced a considerable value of RB in small samples. For example, in Scenario 1,

with $n=100$ and $\lambda_{2}=10$ , ${\rm RB}=-141.521\%$ for the intercept of the dispersion submodel. As for $\lambda_{2}=2$ , considering $n=100$ , ${\rm RB}=106.971\%$ was observed for $\widehat{\lambda}_{2}$ . This bias considerably decreases as the sample size increases; for $n=500$ , the bias for the same estimators are reduced to $-8.150\%$ and $8.290\%$ , respectively. In all cases, it is possible to verify that the MSE values tend fastly toward zero as the size of the sample increases, as was expected because of the consistency of the MLEs.

The simulation results indicate that the MLE in the proposed model performs well. The bias in the dispersion submodel parameter estimators is in accordance with previews results (Ospina et al, 2006; Andrade, 2007; Simas et al, 2010; Ospina and Ferrari, 2012). However, when the symmetric link was considered, the numerical maximization of the log-likelihood function presented some drawbacks. In addition, the asymmetric family of Aranda-Ordaz link function is more flexible than the symmetric version, because it considers the possible asymmetry between the random component and the linear predictors. This way, we suggest the asymmetric family to empirical applications.

It is noteworthy that adequate link functions must be selected when using the usual models with fixed link functions (logit, probit, etc.) in actual data applications, in addition to the selection of the covariates. This model selection procedure can be time-consuming and inconclusive. When considering the proposed model, the selection of link functions is no longer a practical problem. Furthermore, the possible relationships between the parameters of interest, $\mu$ and $\sigma$ , and their respective linear predictors, become more flexible.

7 Application

In this section, the proposed model is employed with actual data to demonstrate its practical applicability. For parametric link functions we choose the asymmetric Aranda-Ordaz family, because it is much more flexible than the symmetric function and its computational implementation is more stable. We considered the data used by Cribari-Neto and Souza (2013) about religious belief in 124 countries. The proportion of nonbelievers in each country is the dependent variable, $Y$ . The covariates considered are the average intelligence quotient of the population in each country ( $IQ$ ), $IQ$ squared ( $IQ^{2}$ ), a dummy variable that equals 1 if the percentage of Muslims is greater than $50\%$ and 0 otherwise ( $MUSL$ ), the per capita income adjusted by the purchasing power parity in 2008 in thousands of dollars ( $INCOME$ ), the logarithm of the ratio between the sum of imports and exports and the Gross Domestic Product in 2008 ( ${\rm log}OPEN$ ), and the interaction between $MUSL$ and $INCOME$ ( $M\times I$ ).

After some adjustments and diagnostic analyses, the model presented in Table 3 was selected. The RESET-type test considering the LR statistic suggests this model was correctly specified ( $p\text{-value}=0.153$ ). It can also be verified that all covariates were significant at the nominal $10\%$ level. Comparatively, using the usual logit link function for the mean and dispersion, with the fitted model covariates given in Table 3, the RESET-type test indicated that the model was not correctly specified ( $p\text{-value}=0.008$ ) at the usual nominal levels. We also tested the hypothesis $H_{0}:(\lambda_{1},\lambda_{2})=(1,1)$ by LR statistic. With $p\text{-value}=0.024$ we reject the hypothesis that the logit is the correct link function in both submodels.

Figure 2 presents a diagnostic analysis of the fitted model. The residual analysis in Figures 2(a) and 2(c), and the observed values ( $y_{t}$ ) versus the predicted values ( $\widehat{\mu}_{t}$ ) in Figure 2(b), indicates that the model was correctly adjusted. The Cook-like distance shown in Figure 2(d), highlights four observations ( $C_{t}>0.5$ ), namely: 17, 77, 97, and 118, corresponding to Burkina Faso, Mozambique, Sierra Leone, and the United States of America (USA), respectively. In Burkina Faso and Sierra Leone, just $0.5\%$ of the population are atheists, which is the smallest percentage of nonbelievers. Mozambique and Sierra Leone present the smallest average $IQ$ among the considered countries. In addition, Mozambique has a large proportion of atheists compared to other countries with similar $IQ$ . Finally, the USA has very high $IQ$ and $INCOME$ values, as well as small $OPEN$ values compared with countries that present a similar percentage of nonbelievers (just $10.5\%$ ). Although the influence measures described by Cribari-Neto and Souza (2013) did not highlight the USA, the authors did discuss this atypical religious characteristic for a country with high $IQ$ .

Conclusions regarding the mean submodel parameter estimates (Table 3) corroborate those of Cribari-Neto and Souza (2013). The variables $IQ$ and $MUSL$ have a negative influence on the mean submodel, whereas $IQ^{2}$ , $INCOME$ and ${\rm log}OPEN$ have a positive influence. In the dispersion submodel, the variable $MUSL$ has a negative influence, whereas the variables $IQ$ , ${\rm log}OPEN$ , and $M\times I$ have a positive influence. It is easy to see that the per capita income adjusted by the purchasing power parity ( $INCOME$ ) is directly proportional to religious disbelief. To assess the impact of $IQ$ on the mean proportion of nonbelievers, the following measure of impact was considered (Cribari-Neto and Souza, 2013):

[TABLE]

This average impact on the proportion of nonbelievers resulting from changes in the $IQ$ covariate when the other covariates remain constant. Figure 3(a) shows the impact of variations in $IQ$ on the average percentage of nonbelievers, with the other covariates set to their mean values. The impact is not constant and varies according to $IQ$ . Up to $IQ=100$ , the impact first increases before decreasing. Figure 3(b) shows the relationship between the estimated mean proportion of nonbelievers and intelligence. This chart suggests that higher values of $IQ$ are related to larger proportions of nonbelievers, with greater impact for $IQ$ values above 85.

In order to compare our proposed model adjusted for religious belief data with the model in Cribari-Neto and Souza (2013), we elected some goodness-of-fit measures. The generalized coefficient of determination ( $R^{2}_{G}$ ), the maximized log-likelihood function ( $\ell(\widehat{\bm{\theta}})$ ), the Akaike information criterion (AIC) and the mean square error (MSE) between the observed ( $y$ ) and predicted ( $\widehat{\mu}$ ) values of the two fitted models are in Table 4. We note that our proposed model outperforms the model with fixed link functions in all measures. In particular, regarding $R^{2}_{G}$ , our fitted model explains the variability of $y$ about $8\%$ more than the model with fixed links.

It is worth noting that the proposed model considers the dispersion parameter $\sigma$ , unlike the model used by Cribari-Neto and Souza (2013), which considered the precision parameter $\phi$ . Note that Cribari-Neto and Souza (2013) selected the loglog link function for the mean and the log link function for the precision.

8 Conclusion

In this paper, we have proposed a beta regression model with parametric link functions, that is useful for modeling variables contained in the interval $(0,1)$ , such as rates and proportions. The vector score and Fisher’s information matrix were derived analytically, and aspects of large sample inference were presented. Diagnostic measures that allow researchers to identify influential points, outlier observations, or shortcomings of the fitted model were also proposed. A simulation study highlighted the accurate finite sample performance of the point estimators. An application to actual data was presented and discussed to demonstrate the practical usefulness of the proposed model. Moreover, the use of parametric link functions enables problems arising from the incorrect specification of link functions to be circumvented, thereby facilitating the construction of an adequate model. Finally, all of the evidence from this study suggests that the proposed model is both useful and adequate for modeling rate and proportion variables.

Acknowledgements

This research was partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil. The final publication is available at Springer via http://dx.doi.org/10.1007/s00362-017-0885-9.

Appendix

In this appendix we obtain the score function and the Fisher’s information matrix for all parameters ( $\bm{\beta}$ , $\bm{\gamma}$ , $\lambda_{1}$ , $\lambda_{2}$ ).

The elements of the score vector are given by:

[TABLE]

for $i=1,\ldots,r$ and $j=1,\ldots,s$ , where $\dfrac{\partial\ell_{t}(\mu_{t},\sigma_{t})}{\partial\mu_{t}}=\dfrac{1-\sigma^{2}_{t}}{\sigma^{2}_{t}}(y^{*}_{t}-\mu^{*}_{t})$ , $\dfrac{\partial\mu_{t}}{\partial\eta_{1t}}=\left[\dfrac{\partial g_{1}(\mu_{t},\lambda_{1})}{\partial\mu_{t}}\right]^{-1}$ , $\dfrac{\partial\eta_{1t}}{\partial\beta_{i}}=x_{ti}$ , $\dfrac{\partial\ell_{t}(\mu_{t},\sigma_{t})}{\partial\sigma_{t}}=a_{t}$ , $\dfrac{\partial\sigma_{t}}{\partial\eta_{2t}}=\left[\dfrac{\partial g_{2}(\sigma_{t},\lambda_{2})}{\partial\sigma_{t}}\right]^{-1}$ and $\dfrac{\partial\eta_{2t}}{\partial\gamma_{i}}=z_{tj}$ .

The second order derivatives of the log-likelihood function are given by:

[TABLE]

where $\dfrac{\partial}{\partial\lambda_{2}}\left(\dfrac{\partial\mu_{t}}{\partial\eta_{1t}}\right)=0$ ,

[TABLE]

Taking the expected value of the second order derivatives given above, since $\mathbb{E}\left(\dfrac{\partial\ell_{t}(\mu_{t},\sigma_{t})}{\partial\mu_{t}}\right)=0$ , we have:

[TABLE]

Since

[TABLE]

we arrive at the conclusion that

[TABLE]

In relation to $\beta_{i}$ and $\lambda_{1}$ , we have:

[TABLE]

The expected value of the second order derivative with respect to $\beta_{i}$ and $\lambda_{2}$ is given by:

[TABLE]

Since $\mathbb{E}\left(\dfrac{\partial\ell_{t}(\mu_{t},\sigma_{t})}{\partial\sigma_{t}}\right)=0$ , we have

[TABLE]

where

[TABLE]

With respect to $\gamma_{j}$ and $\lambda_{1}$ , we have:

[TABLE]

For $\gamma_{j}$ and $\lambda_{2}$ , we have:

[TABLE]

Finally, we have:

[TABLE]

and

[TABLE]

In matrix form, we have:

[TABLE]

Bibliography58

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Adewale and Xu (2010) Adewale AJ, Xu X (2010) Robust designs for generalized linear models with possible overdispersion and misspecified link functions. Computational Statistics & Data Analysis 54(4):875–890
2Akaike (1974) Akaike H (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6):716–726
3Akaike (1983) Akaike H (1983) Information measures and model selection. Bulletin of the International Statistical Institute 50:277–290
4Andrade (2007) Andrade ACG (2007) Efeitos da especificação incorreta da função de ligação no modelo de regressão beta. Master’s thesis, Universidade Federal de São Paulo
5Aranda-Ordaz (1981) Aranda-Ordaz FJ (1981) On two families of transformations to additivity for binary response data. Biometrika 68(2):357–363
6Atkinson (1981) Atkinson A (1981) Two graphical display for outlying and influential observations in regression. Biometrika 68(1):13–20
7Atkinson (1985) Atkinson AC (1985) Plots, Transformations and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. New York: Oxford University Press
8Bayer and Cribari-Neto (2017) Bayer FM, Cribari-Neto F (2017) Model selection criteria in beta regression with varying dispersion. Communications in Statistics - Simulation and Computation 46(1):729–746