Power Lindley distribution and software metrics
Mohammed Khalleefah, Sofiya Ostrovska, Mehmet Turan

TL;DR
This paper explores the properties of the power Lindley distribution, focusing on its moment-(in)determinacy, and demonstrates its application in modeling software metrics data.
Contribution
It provides new theoretical results on the power Lindley distribution's moment-(in)determinacy and applies it to real software metrics data.
Findings
Distribution's moment-(in)determinacy varies with parameters
Application to software metrics data demonstrates practical utility
New theoretical insights into the distribution's properties
Abstract
The Lindley distribution and its numerous generalizations are widely used in statistical and engineering practice. Recently, a power transformation of Lindley distribution, called the power Lindley distribution, has been introduced by M. E. Ghitany et al., who initiated the investigation of its properties and possible applications. In this article, new results on the power Lindley distribution are presented. The focus of this work is on the moment-(in)determinacy of the distribution for various values of the parameters. Afterwards, certain applications are provided to describe data sets of software metrics.
| Values | Frequencies |
|---|---|
| 0 | 35.45 |
| 1 | 54.27 |
| 2 | 7.94 |
| 3 | 1.50 |
| 4 | 0.77 |
| 5 | 0.07 |
| Distribution | Parameters | Error | Mean | Median | ||
|---|---|---|---|---|---|---|
| 1.1913 | 1.6979 | 0.0065 | 0.7923 | 0.6475 | 0.0150 | |
| 1.3969 | 1.0044 | 0.0741 | 0.9158 | 0.7726 | 0.1385 | |
| Value | Frequency | Value | Frequency | Value | Frequency |
|---|---|---|---|---|---|
| 0 | 92.21 | 7 | 0.09 | 14 | 0.04 |
| 1 | 3.73 | 8 | 0.06 | 15 | 0.04 |
| 2 | 1.99 | 9 | 0.11 | 17 | 0.02 |
| 3 | 0.64 | 10 | 0.09 | 18 | 0.02 |
| 4 | 0.32 | 11 | 0.02 | 19 | 0.04 |
| 5 | 0.21 | 12 | 0.11 | 29 | 0.02 |
| 6 | 0.19 | 13 | 0.04 |
| Distribution | Parameters | Error | Mean | Median | ||
|---|---|---|---|---|---|---|
| 0.2750 | 3.6502 | 0.0001 | 0.2265 | 0.0053 | 0.1174 | |
| 0.9499 | 1.0104 | 0.1136 | 1.0341 | 0.6869 | 0.9249 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Distribution Estimation and Applications · Software Reliability and Analysis Research · Reliability and Maintenance Optimization
Power Lindley distribution and software metrics
Mohammed Khalleefah, Sofiya Ostrovska and Mehmet Turan
Abstract
The Lindley distribution and its numerous generalizations are widely used in statistical and engineering practice. Recently, a power transformation of Lindley distribution, called the power Lindley distribution, has been introduced by M. E. Ghitany et al., who initiated the investigation of its properties and possible applications. In this article, as a continuation of the preceding research, new results on the power Lindley distribution are presented. The focus of this work is on the moment-(in)determinacy of the distribution for various values of the parameters. Afterwards, certain applications are provided to describe data sets of software metrics.
*Atilim University, Department of Mathematics, Incek 06836, Ankara, Turkey
e-mail: [email protected], [email protected], [email protected]
Tel: +90 312 586 8211, Fax: +90 312 586 8091*
Keywords: power Lindley distribution, moment problem, Stieltjes class, software metrics
2010 MSC: 62P30, 60E05
1 Introduction
Nowadays, new families of probability distributions are being proposed by a large number of authors with the aim to provide appropriate tools to study the tendencies in the behavior of data sets emerging in the financial mathematics, medical research, computer science, engineering, and other disciplines. See, for example, [2, 12, 14]. Using a variety of criteria and approaches, researchers are seeking distributions to best match experimental data.
The Lindley distribution was introduced in 1958 by D. V. Lindley [17]. Yet, it continues to draw attention from mathematics and its applications, giving rise to new extensions and modifications. See, for example, [5, 6, 9, 10, 11]. The Lindley distribution with parameter is defined by the probability density function (PDF) of the form:
[TABLE]
Formula (1.1) shows that the Lindley distribution is a two-component mixture of the exponential and two-stage Erlang distributions with the mixing proportion The distributions of this form come out in reliability theory, for example, in the study of imperfect fault coverage with the probability of the replacement failure. A comprehensive study of the Lindley distribution and its applications in the framework of reliability theory is performed in [9]. It can be observed that the Lindley distribution as well as the gamma distribution belong to the family of Kummer distributions. The latter was first introduced in 1993 by Armero and Bayarri for conducting a statistical analysis of systems. See [3, 4]. The study of the Kummer distribution was followed up in [21] by K. W. Ng and S. Kotz, who obtained new results on the subject and expanded the assortment of the Kummer-type distributions. The current paper deals with the properties and applications of the power Lindley distribution, which represents the class of -Kummer distributions introduced in [23]. The power Lindley distribution was put forward in 2013 by Ghitany et al. as follows.
Definition 1.1**.**
[12] The power Lindley distribution with parameters is defined by its PDF function:
[TABLE]
We write to indicate that a random variable possesses a power Lindley distribution with parameters and . Evidently, when one recovers a Lindley distribution with PDF (1.1). Observe that has a Lindley distribution with parameter if and only if That is, the power Lindley distribution occurs naturally as a power transformation of a random variable following Lindley distribution. Along with that, power Lindley distribution can also be viewed as a particular case of the -Kummer distribution, whose PDF is given by:
[TABLE]
See [23, Definition 2]. Here, is Euler’s gamma-function and is Kummer’s function of the second kind. For their definitions and properties, one may refer to [1, formulae 6.1.1 and 13.1.3]. Obviously, if and only if it has -Kummer distribution with and the parameters and
This paper aims to pursue the study of the power Lindley distribution initiated in [12]. Specifically, the moment-(in)determinacy for different values of parameters will be determined. It has to be noticed that the moment-(in)determinacy of a probablity distribution is an important factor not only in probability theory, but also in applied areas, see [19, 25, 29]. Moreover, the increasing role of heavy-tailed distributions in financial, engineering and computer science research ([8, 27, 29]) puts additional weight on this subject. In this connection, exemplary Stieltjes classes for power Lindley distributions will be provided in the event of the moment-indeterminacy. Finally, some applications will be given to the data sets of software metrics.
2 Main results
It is known ([9, p, 497]) that the characteristic function of the Lindley distribution is expressed by:
[TABLE]
and hence it is analytic for implying that the Lindley distribution is moment-determinate. The situation with the power Lindley distribution is less straightforward, since, for the characteristic function of distribution is not analytic at 0. Theorem 2.4 presents a necessary and sufficient condition for the moment-(in)determinacy of the power Lindley distribution.
To begin with, some analytical properties of the characteristic functions of the power Lindley distribution are stated in the next claim.
Theorem 2.1**.**
The characteristic function of a power Lindley distribution is entire of order when , analytic on interval when , and is not analytic at 0 otherwise.
Proof.
The conditions for the analyticity of the characteristic function can be expressed in terms of the tail function, which for the power Lindley distribution coincides with its survival function . According to [12, formula (3)]:
[TABLE]
By [18, formula (2.2.3)], the characteristic function of the distribution is analytic on if and only if its tail function satisfies
[TABLE]
Clearly, for condition (2.1) holds for all whence in this case the characteristic function is entire, while for estimate (2.1) is true only when As for condition (2.1) is violated whatever is and, therefore, the characteristic function is not analytic at 0. In the case of the entire characteristic function, its order and type can be calculated by Theorem 2.4.4 of [18], yielding respectively. ∎
Corollary 2.2**.**
The outcomes of Theorem 2.1 can be restated in the following way. The moment generating function of the power Lindley distribution with parameters and :
- •
exists for all real numbers if
- •
exists on interval if
- •
does not exist if
Corollary 2.3**.**
If then distribution is moment-determinate.
This follows immediately from Cramér’s condition for the moment-determinacy [16, Theorem 1]. The case needs an additional investigation. Notice that in this case the distribution becomes heavy-tailed. While each light-tailed distribution is uniquely determined by its moments, for heavy-tailed distributions the uniqueness may not hold. Heavy-tailed distributions, many of which are not unique with respect to the moments, are instrumental in stock market modeling and engineering [29]. For this reason, non-uniqueness of the distributions with respect to moments needs deep investigation. The respective outcomes on the moment-(in)determinacy of the power Lindley distribution are summarized in the next assertion.
Theorem 2.4**.**
The power Lindley distribution is moment-indeterminate if and only if
Proof.
In essence, the proof is based on the estimates for the rate of growth of moments. The needed facts are presented in the review [16]. In the context of this proof, letter - with or without subscripts - is used to denote positive constant whose value does not need being evaluated.
If then the moments of have been calculated in [12] as follows:
[TABLE]
Hence, if then
[TABLE]
and by the condition (s1) [16, Theorem 2], the distribution is moment-determinate.
To examine the case we write using (2.2):
[TABLE]
Applying Stirling’s formula, one has:
[TABLE]
Since writing one obtains:
[TABLE]
To show that the distribution is moment-indeterminate, the estimate (2.3) has to be supplemented by checking whether the density (1.2) satisfies Lin’s condition, that is, to show that Lin’s function is monotone increasing for and that Plain calculations yield:
[TABLE]
In addition,
[TABLE]
implying that for large enough. Thus, by [16, Theorem 7] when distribution is moment-indeterminate. The proof is complete.
∎
Remark 2.1*.*
Alternatively, the moment-(in)determinacy of a power Lindley distribution can be derived from [23, Theorem 7], where a more complicated approach was used.
When a probability distribution is moment-indeterminate, the problem arises to expose different distributions with the same moments of all orders. In this paper, this will be done by presenting Stieltjes classes for the density (1.2), which are infinite families of PDFs having the same moments of all orders. Although the Stieltjes classes per se can be traced to the works of P. L. Chebyshev, T. Stieltjes, and C. Heyde [26, 28], the name itself is quite recent. To pay tribute to the contribution of Stieltjes to the moment problem, J. Stoyanov [28] in 2004 suggested the name ‘Stieltjes classes’, thus triggering their systematic study, which is still in progress. See, for example [16, 22, 24, 25] and references therein.
For the convenience of readers, we supply the necessary definitions below.
Definition 2.1**.**
Let be a PDF of a random variable with finite moments of all orders, and let be an integrable function on such that If, for all
[TABLE]
then is called a perturbation function of the density
Definition 2.2**.**
Let be a PDF and be a perturbation function of The set
[TABLE]
is said to be a Stieltjes class for based on
Obviously, is an infinite family of densities all having the same sequence of moments as Observe that, for a density function there are different Stieltjes classes based on various perturbation functions The next statement provides exemplary perturbation functions for (1.2).
Theorem 2.5**.**
The following functions are perturbations for PDF (1.2) in the case :
- (i)
** 2. (ii)
, where 3. (iii)
**
where for and constants are chosen in such a way that
Proof.
Since all functions satisfy , what is left is to show that
[TABLE]
The expressions (i) - (iii) are derived with the help of [22, Example 3.2]. Here, we only have to check equalities (2.4). For this purpose, the identities below ([13, formulae 3.944, 9 and 10]) will be used:
[TABLE]
and
[TABLE]
Denote:
[TABLE]
Then, the substitution yields
[TABLE]
Setting and one derives from (2.5)
[TABLE]
Observe that (2.5) is applicable because and by the condition on
Likewise, to justify (ii), we write:
[TABLE]
This is an integral of the form (2.5), where and and hence as in the previous case.
Finally, in the case (iii), integral can be split as
[TABLE]
The same substitution leads to:
[TABLE]
Applying formulae (2.5) and (2.6) with and one derives that Similarly, with and we obtain that
∎
Corollary 2.6**.**
Let be a PDF for distribution with Then, the following sets are Stieltjes classes for :
[TABLE]
3 Application to software metrics
Software metrics are objective measurements of software products used to assess the quality of the products. These days, a variety of software metrics are being proposed related to different parameters such as the size (of software as a whole or size of its inherent classes and methods), complexity (of software system, classes, methods), internal and external quality characteristics of a software system. See, for example, [15, 20]. Correspondingly, ample amount of data on the values of software metrics were collected and, as a result, a statistical analysis of such data has become in demand within engineering studies. See, for example, [8, 20] and [27] where one can find an extensive list of references. In some problems related to software metrics, such as creating catalogues for threshold values, it is important to find probability distributions which best fit the empirical data. In the literature, the two-parameter Weibull distribution has been indicated as a useful instrument for this purpose, while new distributions are being offered by statisticians aiming to provide better tools for specific practical problems.
In this section, we implement the power Lindley distribution to data arrays provided to the authors as a courtesy by M. Stojkovski [27], who collected the data related to 17 unique categories and, in each category, calculated the values of the following 5 metrics:
- •
CBO (Coupling Between Objects)
- •
DIT (Depth of Inheritance Tree)
- •
NOC (Number Of Children)
- •
NOM (Number Of Methods)
- •
RFC (Response For Class)
In this article, the data related to DIT and NOC metrics are used. These metrics were introduced and investigated by Chidamber and Kemerer [7] in order to measure complexity and coupling. The other data sets available in [27] can be analyzed likewise.
In the next two examples, the MATHLAB software was used and the method of least squares was applied to fit the power Lindley density.
Example 3.1** (DIT system metric).**
DIT represents the maximum length of the path, as a number of graph edges, from a node to the root of the inheritance tree. It is known that the greater DIT value is, the higher the complexity of a design becomes. The data collected in [27] can be summarized in Table 1.
Using the method of least squares, these data were approximated by the power Lindley density with Also, for comparison, we used the fitted Weibull distribution found in [27] with the help of the EasyFit software. Also, the error of approximation in each case was obtained. Table 2 summarizes the results and Figure 1 shows the data along with the fitted curves.
Example 3.2** (NOC system metric).**
NOC represents the number of immediate subclasses of a class in the hierarchy, measuring the number of subclasses inheriting the methods of the parent class. It is known that when NOC rises, so does re-use. The highlights of the data collected in [27] appears in Table 3.
It can be observed that the behavior of this data set is essentially different from that of DIT. The data set possesses strong right-skewed pattern, where the frequency of 0 dominates all of the other frequencies.
Like before, the method of least squares was applied and the outcomes along with the fitted Weibull distribution found in [27] by means of the EasyFit software are placed in Table 4 and Figure 2.
4 Conclusion
This work is a continuation of the study on power Lindley distribution, initiated by M. E. Ghitany et al. in [12]. The goal of the current research is to obtain new results on the distribution and provide some novel applications. Since the power Lindley distribution becomes heavy-tailed when - and, consequently, does not possess a moment-generating function - the examination of its moment-(in)determinacy in this case has to be carried out. This is precisely the main result of this paper, stating that distribution is moment-indeterminate if and only if Several Stieltjes classes have been constructed for this case.
Furthermore, this paper has discussed certain applications dealing with real data sets pertinent to the values of software metrics. Software metrics are currently a hot topic in the software engineering as they address quality standards followed by the software developers. The two-parameter Weibull distribution is commonly used to fit experimental data sets of software metrics. In this research, using the data collected in [27] for DIT and NOC metrics, it is shown that, for certain data sets, power Lindley distribution provides a better description of the data than Weibull distribution, not only for the light- but also for the heavy-tailed case. It has to be pointed out that both distributions are two-parameter, and therefore, similar in terms of complexity of the models. As for future work, it is planned to perform a similar data analysis for other software metrics and find new threshold values in collaboration with respective specialists.
Acknowledgements
The authors express their sincere gratitude to Dr. Deepti Mishra (NTNU) for consulting them on the software metrics and to Mr. Mile Stojkovski for providing the collected data sets along with relevant references. Also, our thanks go to Mr. P. Danesh from the Atilim University Academic Writing and Advisory Center for his help in the presentation of the manuscript.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Abramowitz and I. A. Stegun, Handbook of mathematical functions with formulas, graphs, and mathematical tables , Dover Publications, New York, 1972.
- 2[2] A. Al-Babtain, A. A. Fattah, A-H. N. Ahmed and F. Merovci, The Kumaraswamy-transmuted exponentiated modified Weibull distribution, Communications in Statistics - Simulation and Computation 46 (5) (2017), 3812–3832.
- 3[3] C. Armero and M. J. Bayarri, A Bayesian analysis of a queueing system with unlimited service, Technical Report # 93-50 (1993), Department of Statistics, Purdue University.
- 4[4] C. Armero and M. J. Bayarri, A Bayesian analysis of a queueing system with unlimited service, J. Stat. Plan. Inf. 58 (1997), 241–261.
- 5[5] T. Arslan, S. Acitas, B. Senoglu, Generalized Lindley and Power Lindley distributions for modeling the wind speed data, Energy Conversion and Management 152 (15), (2017), 300–311.
- 6[6] H. S. Bakouch, B. M. Al-Zahrani, A. A. Al-Shomrani, V. A. A. Marchi, F. Louzada, An extended Lindley distribution, J. Korean Stat. Soc 41 (2012) 75–85.
- 7[7] S. R. Chidamber, and C. F. Kemerer, A metrics suite for object oriented design, IEEE Trans. Software Eng. 20 (6) (1994) 476-–493.
- 8[8] K. A. M. Ferreira, M. A. S. Bigonha, R. S. Bigonha, L. F. O. Mendes, H. C. Almeida. Identifying thresholds for object-oriented software metrics, J. Systems and Software 85 (2012), 244–257.
