Continuous chain-ladder with paid data
Stephan M. Bischofberger, Munir Hiabu, Alex Isakson

TL;DR
This paper develops a continuous-time, non-parametric framework for predicting outstanding liabilities using hazard functions and kernel smoothing, demonstrating consistency and improved estimation methods.
Contribution
It introduces a continuous-time chain-ladder model with histogram and kernel estimators, extending traditional methods with new smoothing techniques and theoretical consistency results.
Findings
The proposed methods are consistent under increasing claim data and decreasing aggregation levels.
Kernel-based estimators outperform traditional development factors in simulations.
Real-data application confirms the effectiveness of the new estimators.
Abstract
We introduce a continuous-time framework for the prediction of outstanding liabilities, in which chain-ladder development factors arise as a histogram estimator of a cost-weighted hazard function running in reversed development time. We use this formulation to show that under our assumptions on the individual data chain-ladder is consistent. Consistency is understood in the sense that both the number of observed claims grows to infinity and the level of aggregation tends to zero. We propose alternatives to chain-ladder development factors by replacing the histogram estimator with kernel smoothers and by estimating a cost-weighted density instead of a cost-weighted hazard. Finally, we provide a real-data example and a simulation study confirming the strengths of the proposed alternatives.
| future year | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | total |
| CL | 3972072 | 2997371 | 2241199 | 1613228 | 1157272 | 699522 | 401116 | 190240 | 49779 | 0 | 13321800 |
| LL | 4606662 | 2676251 | 1944796 | 1396172 | 940474 | 562684 | 296735 | 120538 | 19231 | 10 | 12563553 |
| LC | 4706341 | 2734479 | 2014402 | 1473544 | 1020410 | 639236 | 357675 | 172538 | 49401 | 3420 | 13171446 |
| accident year | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | total |
| CL | 0 | 31977 | 216901 | 496947 | 830549 | 1503785 | 1778076 | 2221387 | 2387308 | 3854869 | 13321800 |
| LL | 18 | 28133 | 179632 | 440042 | 823458 | 1323252 | 1734902 | 2055000 | 2364939 | 3614176 | 12563553 |
| LC | 5144 | 62617 | 248154 | 499977 | 906861 | 1403358 | 1792929 | 2100946 | 2372382 | 3779076 | 13171446 |
| Scenario | |||
|---|---|---|---|
| 1 | decreasing beta | truncated mixed normal | moderately decreasing |
| 2 | decreasing beta | truncated mixed normal | heavily decreasing |
| 3 | decreasing beta | boundary challenge | moderately decreasing |
| 4 | decreasing beta | boundary challenge | heavily decreasing |
| 5 | mixture of betas | truncated mixed normal | moderately decreasing |
| 6 | mixture of betas | truncated mixed normal | heavily decreasing |
| 7 | mixture of betas | boundary challenge | moderately decreasing |
| 8 | mixture of betas | boundary challenge | heavily decreasing |
| LL | LC | CL | |||||
|---|---|---|---|---|---|---|---|
| Median | Mean (s.d.) | Median | Mean (s.d.) | Median | Mean (s.d.) | ||
| 1 | 100 | 0.2937 | 0.3897 (0.4169) | 0.2821 | 0.4008 (0.4046) | 0.3525 | 0.5465 (0.4169) |
| 1000 | 0.0999 | 0.1198 (0.0911) | 0.0945 | 0.1170 (0.0952) | 0.1230 | 0.1459 (0.0911) | |
| 10000 | 0.0376 | 0.0439 (0.0326) | 0.0353 | 0.0411 (0.0303) | 0.0425 | 0.0501 (0.0326) | |
| 1e+05 | 0.0253 | 0.0259 (0.0155) | 0.0242 | 0.0251 (0.0146) | 0.0162 | 0.0186 (0.0155) | |
| 2 | 100 | 0.3140 | 0.3892 (0.3650) | 0.2473 | 0.3743 (0.4307) | 0.3626 | 0.5370 (0.3650) |
| 1000 | 0.1094 | 0.1317 (0.1032) | 0.1164 | 0.1328 (0.0956) | 0.1266 | 0.1514 (0.1032) | |
| 10000 | 0.0464 | 0.0541 (0.0380) | 0.0656 | 0.0694 (0.0434) | 0.0537 | 0.0583 (0.0380) | |
| 1e+05 | 0.0302 | 0.0308 (0.0168) | 0.0429 | 0.0431 (0.0171) | 0.0256 | 0.0267 (0.0168) | |
| 3 | 100 | 0.3350 | 0.3871 (0.3078) | 0.2773 | 0.3209 (0.2608) | 0.4443 | 0.6169 (0.3078) |
| 1000 | 0.1129 | 0.1309 (0.0979) | 0.1147 | 0.1317 (0.0943) | 0.1397 | 0.1581 (0.0979) | |
| 10000 | 0.0510 | 0.0589 (0.0407) | 0.1056 | 0.1055 (0.0510) | 0.0962 | 0.0969 (0.0407) | |
| 1e+05 | 0.0392 | 0.0393 (0.0192) | 0.1000 | 0.1001 (0.0189) | 0.0824 | 0.0821 (0.0192) | |
| 4 | 100 | 0.3241 | 0.3552 (0.3022) | 0.4216 | 0.4199 (0.2195) | 0.4687 | 0.6987 (0.3022) |
| 1000 | 0.1198 | 0.1393 (0.1025) | 0.2368 | 0.2424 (0.1193) | 0.1979 | 0.2078 (0.1025) | |
| 10000 | 0.0721 | 0.0776 (0.0497) | 0.1941 | 0.1939 (0.0537) | 0.1669 | 0.1664 (0.0497) | |
| 1e+05 | 0.0475 | 0.0478 (0.0203) | 0.1650 | 0.1644 (0.0196) | 0.1445 | 0.1443 (0.0203) | |
| 5 | 100 | 0.2871 | 4.189e-01 (0.8021) | 0.1877 | 2.831e-01 (0.3990) | 0.3198 | 1.663e+11 (0.8021) |
| 1000 | 0.1255 | 0.1399 (0.0952) | 0.1161 | 0.1286 (0.0869) | 0.1512 | 0.2437 (0.0952) | |
| 10000 | 0.0490 | 0.0575 (0.0437) | 0.0660 | 0.0724 (0.0465) | 0.0763 | 0.0841 (0.0437) | |
| 1e+05 | 0.0270 | 0.0293 (0.0195) | 0.0479 | 0.0479 (0.0210) | 0.0288 | 0.0319 (0.0195) | |
| 6 | 100 | 0.3169 | 3.943e-01 (0.4260) | 0.2303 | 2.800e-01 (0.2365) | 0.3452 | 1.028e+10 (0.4260) |
| 1000 | 0.1214 | 0.1385 (0.1017) | 0.1228 | 0.1384 (0.0949) | 0.1502 | 0.2113 (0.1017) | |
| 10000 | 0.0461 | 0.0539 (0.0408) | 0.0636 | 0.0694 (0.0454) | 0.0680 | 0.0758 (0.0408) | |
| 1e+05 | 0.0295 | 0.0320 (0.0194) | 0.0522 | 0.0519 (0.0199) | 0.0301 | 0.0334 (0.0194) | |
| 7 | 100 | 0.5853 | 7.398e-01 (1.0259) | 0.5016 | 5.832e-01 (0.6756) | 0.6462 | 6.066e+11 (1.0259) |
| 1000 | 0.2089 | 0.2954 (0.5423) | 0.2858 | 0.3258 (0.3315) | 0.3892 | 0.4983 (0.5423) | |
| 10000 | 0.0981 | 0.1134 (0.1183) | 0.2109 | 0.2157 (0.1070) | 0.1802 | 0.1846 (0.1183) | |
| 1e+05 | 0.0648 | 0.0675 (0.0414) | 0.1699 | 0.1702 (0.0568) | 0.1489 | 0.1478 (0.0414) | |
| 8 | 100 | 0.4721 | 5.814e-01 (1.0504) | 0.4595 | 4.445e-01 (0.4085) | 0.6020 | 1.307e+11 (1.0504) |
| 1000 | 0.1661 | 0.2012 (0.1496) | 0.2717 | 0.2816 (0.1633) | 0.3565 | 0.4672 (0.1496) | |
| 10000 | 0.1119 | 0.1217 (0.0822) | 0.2317 | 0.2303 (0.0908) | 0.2104 | 0.2059 (0.0822) | |
| 1e+05 | 0.0934 | 0.0923 (0.0423) | 0.1962 | 0.1948 (0.0487) | 0.1819 | 0.1796 (0.0423) |
| Brand | Probability | Basis price | Model | Probability | Price factor |
|---|---|---|---|---|---|
| Brand 1 | 0.45 | $600 | 0 | 0.05 | 1 |
| Brand 2 | 0.30 | $550 | 1 | 0.10 | 1.15 |
| Brand 3 | 0.15 | $300 | 2 | 0.35 | |
| Brand 4 | 0.10 | $150 | 3 | 0.50 |
| Incident | Yearly hazard rate | ||
| Breakage | 0.15 | 2 | 5 |
| Oxidation | 0.05 | 5 | 3 |
| Theft | 5 | 0.5 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Continuous chain-ladder with paid data
Stephan M. Bischofberger111Corresponding author: Stephan M. Bischofberger, e-mail: [email protected], address: Cass Business School, 106 Bunhill Row, London, EC1Y 8TZ, United Kingdom.
Cass Business School, City, University of London, United Kingdom
Munir Hiabu
School of Mathematics and Statistics, University of Sydney, Australia
Alex Isakson
Cass Business School, City, University of London, United Kingdom
Abstract
We introduce a continuous-time framework for the prediction of outstanding liabilities, in which chain-ladder development factors arise as a histogram estimator of a cost-weighted hazard function running in reversed development time. We use this formulation to show that under our assumptions on the individual data chain-ladder is consistent. Consistency is understood in the sense that both the number of observed claims grows to infinity and the level of aggregation tends to zero. We propose alternatives to chain-ladder development factors by replacing the histogram estimator with kernel smoothers and by estimating a cost-weighted density instead of a cost-weighted hazard. Finally, we provide a real-data example and a simulation study confirming the strengths of the proposed alternatives.
Keywords: chain-ladder method; general insurance; granular reserving; nonparametric estimation; survival analysis.
1 Introduction
The classical run-off triangle used for the prediction of outstanding liabilities can be explained as a two-way ANOVA arrangement, where data is organized on a two-dimensional plane of (cohort, age) with cohort being the accident date or underwriting date of a claim, and age the time from that date to a payment. Developed at least in the beginning of the last century, the chain-ladder method is still the industry standard for estimating the future cost of outstanding liabilities from these run-off triangles. However, as a deterministic algorithm, chain-ladder does not specify the assumptions that it is based on, nor the uncertainty of the estimation.
Stochastic models around the chain-ladder method are the Mack Model (Mack, 1993) and multiplicative models in Kremer (1982); Verrall (1991); Renshaw & Verrall (1998) and Kuang et al. (2009) among many others. A comprehensive review is given in England & Verrall (2002). The drawback of these papers is that they do not discuss how the data arises as aggregation from individual data. This is needed when one wants to understand the underlying assumptions of the model. Taylor (1986) coined the term macro-models to describe these previous models and defined models that begin on an individual level as micro-models. Macro models have assumptions which are hard to justify once a data generating process with individual payments is considered. The assumptions of the most widely used Mack model can hardly be justified if one considers that the cells in the classical run-off triangle are aggregations of individual payments. Under Mack’s assumptions, not a single future payment can be independent of the past. This is because the conditional expectation of the next cell within a row of the run-off triangle is a multiple of all previous observations in the same row. The other big class of models is those of Kremer (1982); Verrall (1991); Renshaw & Verrall (1998) and Kuang et al. (2009). They assume that the expected claim amount in one cell is the product of a row factor and a column factor — representing underwriting/accident date and payment delay, respectively. This multiplicative structure implies that there is no interaction effect between rows and columns working on the expected claim amount. In Hiabu (2017) it has been shown that this non-interaction assumption generally does not hold because the cells in the run-off triangle are aggregated as parallelograms as illustrated in Figure 1. These parallelograms will generally introduce interdependencies, which violate the multiplicative structure assumption leading to an interaction effect. Hence, assuming a multiplicative structure produces a bias that grows with the level of aggregation. Therefore, as done in this paper, consistency of payment predictions can only hold in a continuous framework where the level of aggregation is understood to converge to zero with increasing number of observations.
Recent literature connects the chain-ladder method and its data to counting process theory in survival analysis. Hiabu et al. (2016) introduced a statistical model including the data generating process which is built on the continuous model of Martínez-Miranda et al. (2013). The sampling technique of the chain-ladder method is different from other sampling techniques used in classical (bio-)statistical literature. Individuals or policies are only followed if a failure, i.e., a claim occurs. This has the advantage that less data is required than in classical survival data. Truncation occurs when cohort plus age is greater than the date of data collection. However, Martínez-Miranda et al. (2013) and Hiabu et al. (2016) only considered claim counts, and ignored its associated payments.
In this paper, we introduce a micro-model in continuous time in which chain-ladder development factors, applied on a paid triangle, are a histogram estimator of a cost-weighted hazard function running in reversed development time. We establish new assumptions under which consistency of the development factors is achieved. Consistency is understood in the sense that both the number of observed claims grows to infinity and the level of aggregation tends to zero. Finally, we improve on chain-ladder estimation by replacing the histogram estimator with kernel smoothers and by estimating a cost-weighted density instead of a cost-weighted hazard.
There is also a growing literature on micro-models for estimating outstanding liabilities in non-life insurance that is not based on the chain-ladder idea. Arjas (1989) and Norberg (1993) formulated models in a classical bio-statistical setup with a non-homogeneous marked Poisson process. A strong case study in this setting has been developed in Antonio & Plat (2014) and the models have been further studied in Huang et al. (2015) and Huang et al. (2016). These models are more complex than chain-ladder models. They model each delay component in the claims process separately and require full inference on the marked point process, for instance the distribution of the mark/cost. They also require additional information about the exposure, i.e., information about the number of policies underwritten. The assumptions of this paper can be used to decide whether this additional complexity is beneficial noting that additional complexity introduces bias and is only advisable if a significantly better fit can be obtained. Additional complexity might also be necessary if claims with different accident dates, e.g., due to calendar time effects, are not independent, as investigated in Shi et al. (2012); Merz et al. (2013); Lee et al. (2015); Badescu et al. (2016); Avanzi et al. (2016); Lee et al. (2017) and Crevecoeur et al. (2019). If more complexity is justified, the estimator presented in this paper can be used as a building block in those and other more complex models. However, this is beyond the scope of this present paper.
This paper is structured as follows. Section 2 describes the mathematical model and Section 3 links chain-ladder development factors to that framework by identifying them as histogram estimator of a hazard function. Section 4 proposes improvements on chain-ladder development factors by replacing the histogram with kernel smoothers and by estimating a density function instead of a hazard function. We provide a data application and simulation studies in Sections 5 and 6. All proofs can be found in the appendix.
2 Mathematical framework
We start by putting the unique sampling scheme of chain-ladder into a micro-structure framework. We observe counting processes , , for claims and call development time. Each counting process starts with value zero at the underwriting date underlying its claim. It jumps, with jump-size one, whenever a payment is made. Additionally to every jump, we observe a mark indicating the size of the payment made. The number of counting processes, , varies over calendar-time: We follow retrospectively only those claims for which at least one payment has been observed, i.e., we do not follow every claim in the policy book. In this paper, we make the following assumptions.
- [M1]
All claims are independent. 2. [M2]
Every claim consists of only one payment.
Assumptions [M1] and [M2] are rather strong but are made to simplify the mathematical derivations yielding a first and clean step towards a better understanding of chain-ladder on a micro-structure level. Possible ways to relax these assumptions are weak dependency (instead of [M1]) and a Markov process structure where every jump triggers a new state (instead of [M2]). This, however, is beyond the scope of the present paper.
The jump-time in development direction corresponding to the payment for claim is denoted by . Thus, we get
[TABLE]
where denotes the indicator function. As pointed out in Hiabu et al. (2016), statistical inference on the counting process is not directly feasible. We only follow a claim once we have observed at least one payment. Therefore, by design it holds
[TABLE]
where is the underwriting date or accident date of claim . Hence, by not following every policy, we are exposed to a right-truncation problem instead of a right-censoring problem. In the sequel, for notational convenience, we parameterize the dates such that which yields .
A solution to the right-truncation problem is to reverse the time of the counting process leading to a tractable left-truncation problem (Ware & DeMets, 1976). To this end we consider the counting processes
[TABLE]
each with respect to the filtration
[TABLE]
satisfying the usual conditions (Andersen et al., 1993, p. 60), and where is the set of all zero probability events. It is well known (Andersen et al., 1993, Theorem II.4.2), that the intensity process of is
[TABLE]
where
[TABLE]
which is a product of a deterministic function and a predictable function. This structure is called Aalen’s multiplicative intensity model (Aalen, 1978), and enables nonparametric estimation and inference on the deterministic factor , which is done in Hiabu (2017).
Let denote the payment size of claim and consider the process . Ignoring for now the necessary regularity conditions, it is straight forward to see that
[TABLE]
which asymptotically satisfies Aalen’s multiplicative intensity structure, if converges to 1. This convergence will be verified below and it is sufficient to apply the well developed techniques for counting processes, which we do in this paper. In the next section we will show that chain-ladder development factors (which are defined for instance in Taylor (1986)) are a nonparametric histogram estimator of .
When the goal is to predict outstanding liabilities, one is interested in the untruncated versions of the truncated observations. We will indicate these variables by suppressing the subscript, i.e., three-dimensional random variable has the same distribution as , for every , if conditioned on the event . We make the following assumptions on the untruncated objects.
- [M3]
The random variables and have, respectively, strictly positive continuous density functions with support and with support , , each with respect to the Lebesgue measure. Moreover, the continuous joint density of with respect to the Lebesgue measure exists and 2. [CLM1]
The random variables and are independent. 3. [CLM2]
There exist functions such that .
Assumption [M3] ensures that the intensity is well defined. Note that [CLM1] is a statement about the untruncated objects and does not imply that and are independent, noting that . The second part of Assumption [CLM2] means that there is no interaction effect between development time and underwriting date on the claim amount.
To align the density of with our setting, we define the cost-weighted density of as
[TABLE]
Conditional expectations to point events with probability zero here and below are well defined through the continuous density . Analogously, we define the cost-weighted density in reversed time as and . Moreover, the underlying hazard rate in reversed time can be derived from the above definition as
[TABLE]
Proposition 2.1
Given [M1]–[M3], for , it holds
[TABLE]
If additionally Assumptions [CLM1] and [CLM2] hold, then
[TABLE]
Proof 2.2**.**
See Appendix A.1.
Proposition 2.1 explains why we gave the last two assumptions a ‘CLM’-prefix. The convergence in (1) ensures that Aalen’s multiplicative intensity model is approximately satisfied and chain-ladder development factors are histogram estimates of . But it is only via equation (2) that chain-ladder development factors also approximate . For the latter to be true, we assume [CLM1] and [CLM2]. Under these two additional conditions, the chain-ladder algorithm predicts the right object and leads to a sensible quantification of the outstanding liabilities.
We close this section with some further remarks.
Remark 2.3** (Remark 1).**
(Exposure). As in traditional chain-ladder and in contrast to classical survival data, because all failures (claims) are observed, there is no additional information needed about the number of individuals under risk (i.e. exposure in form of the number of underwritten policies) in order to estimate future claim amounts. The unique sampling leads to a right-truncation which is solved by reversing the time. This is different to the approaches described in Arjas (1989) and Norberg (1993).
Remark 2.4** (Remark 2).**
(Cost-weighted density). The density is the continuous analogue to the column parameter of chain-ladder, which is often called and which is considered in Kremer (1982); Verrall (1991); Renshaw & Verrall (1998) and Kuang et al. (2009). Moreover, integrating to one, is indeed a density function.
Remark 2.5** (Remark 3).**
(Predicting outstanding liabilities). An estimator of the cost-weighted hazard, , in conjunction with a chain-ladder algorithm can be used to predict outstanding liabilities. Alternatively, as proposed in this paper, one can employ estimators of the cost-weighted densities . If the maximum development time of a claim is , then the expected outstanding liabilities for claims underwritten in , , is given as
[TABLE]
where \widetilde{f}_{T,U}(t,u)=E[Z]^{-1}E\mathopen{}\mathclose{{}\left[Z|\ T=t,U=u}\right]f_{T,U}(t,u) is the cost-weighted density of . The total amount of payments until today is given by and the fraction in gives the expected ratio between outstanding payments and past payments. Note that under Assumptions [CLM1] and [CLM2], the cost-weighted joint density factorizes into . In Section 4 we propose estimators for . Due to symmetry, the component can be estimated by swapping the roles of and . Outstanding liabilities are estimated by replacing and in with their estimates. Developing estimation theory for is rather involved because of the non-trivial integrals and the ratio-structure in (3). We only consider a simulation study for the prediction performance of in Section 6.
Remark 2.6** (Remark 4).**
(Assumptions). While the model is built around the observation of independent claims with one single payment each (Assumptions [M1] and [M2]), allowing for claim clusters and multiple payments per claim is feasible and would only require some further assumptions.
Assumption [CLM1] is analogue to the usual multiplicity assumption found for example in Kremer (1982); Verrall (1991); Renshaw & Verrall (1998) and Kuang et al. (2009). The difference is that [CLM1] refers to claim counts and not to claim amounts. However, [CLM1] and [CLM2] together imply the multiplicity of aggregated expected claim amounts as assumed in the literature — if ignoring the potential bias arising from aggregation. Chain-ladder development factors can be biased if the cost-weighed development delay with density is neither exponentially distributed nor uniformly distributed within each development period (Hiabu, 2017).
3 Chain-ladder development factors
We now discuss how hazard rates can be estimated in the framework of Section 2. In the setting of Proposition 2.1, the intensity of at is asymptotically equal to . We use this fact to construct a least squares criterion to estimate . Given a smoothing parameter, , and a weight function , we look for estimators that minimize
[TABLE]
where the expression is understood as being zero whenever is zero. The term is a vertical shift subtracted to make the integral well-defined. Since does not depend on , the estimator is defined by a local weighted least squares criterion. We understand the integral with respect to as a Stieltjes integral.
Let be an equidistant partition of the interval with bin-width and some integer . For , we set
[TABLE]
The first order condition minimizing (4) under the weighting leads to the histogram estimator
[TABLE]
Analogue to Hiabu (2017) it can be shown that, up to lower order terms, chain-ladder development factors equal . Therefore, the following proposition can also be interpreted as a central limit theorem for development factors. We make the following assumptions.
- [S1]
The bandwidth satisfies and for .
- [S2]
The density is two times continuously differentiable.
- [S3]
The function is continuously differentiable.
We define the following quantity.
[TABLE]
Proposition 3.1**.**
Under Assumptions [M1]–[M3], [CLM1], [CLM2], and [S1]–[S3], for , , it holds
[TABLE]
in distribution, where
[TABLE]
Proof 3.2**.**
See Appendix A.5.
Hence, apart from the usual regularity condition, under [CLM1] and [CLM2], consistency of the development factors is achieved if both the number of observations, , goes to infinity and the level of aggregation, , tends to zero.
In the next section, we propose two improvements to chain-ladder development factors. Firstly, we replace the histogram weighting, , with local polynomial kernel smoothers leading to a reduced bias. Secondly, we will work with the density function instead of the hazard function because we expect estimation of the density to be more robust. This is because the hazard function, due to the bounded support, usually increases heavily at the right boundary whereas the shape of the density is less explosive. A simulation study in Bischofberger et al. (2019) confirms this heuristic.
4 Local polynomial density estimation
In this section we introduce two nonparametric estimators of the one-dimensional cost-weighted density : the local constant estimator and the local linear estimator. The idea of local polynomial fitting is quite old and might originate from early time series analysis (Macaulay, 1931). It has been adapted to the regression case in Stone (1977) and Cleveland (1979). A general overview of local polynomial fitting can be found in Fan & Gijbels (1996).
Note that can be estimated analogously by inverting the roles of and and adapting the definitions of , etc. The joint cost-weighted density is then estimated by in line with Remark 3.
We first define the cost-weighted Kaplan-Meier product-limit estimator of the survival function \widetilde{S}_{T}^{R}(t)=\int_{t}^{\infty}\widetilde{f}^{R}_{T}(s)\mathrm{d}s=\{E\mathopen{}\mathclose{{}\left[Z|\ T^{R}\geq t}\right]/E[Z]\}\int_{t}^{\infty}f^{R}_{T}(s)\mathrm{d}s as
[TABLE]
where \widehat{\widetilde{A}}{}^{R}(t)=\sum_{i=1}^{n}\int_{0}^{t}Z_{i}\mathopen{}\mathclose{{}\left\{\sum_{j=1}^{n}Z_{j}Y_{j}^{R}(s)}\right\}^{-1}\mathrm{d}N^{R}_{i}(s) is motivated by the Aalen estimator, estimating \widetilde{A}^{R}(t)=\int_{0}^{t}E\mathopen{}\mathclose{{}\left[Z|\ T^{R}=s}\right]\{E\mathopen{}\mathclose{{}\left[Z|\ T^{R}\geq s}\right]\}^{-1}\alpha^{R}(s)\mathrm{d}s. Here the product can be understood as simple finite product because of the finite number of jump points of as explained in (Andersen et al., 1993, p. 89). Let denote a polynomial of degree . For , we define the local polynomial estimator of degree , of as the minimizer in the equation
[TABLE]
For a kernel and bandwidth , we set as usual. The expression is needed to make (6) well defined. Since does not depend on , is defined by a local weighted least squares criterion.
In the sequel we will only consider the cases , i.e., the local constant and local linear case. While a higher degree in conjunction with higher order kernels improves the asymptotic properties, finite sample studies show that improvements are only visible with unrealistically big sample sizes. In the local constant case of (6) we derive the first order condition
[TABLE]
and conclude the local constant estimator
[TABLE]
The final estimator in non-reversed time is then simply defined as
[TABLE]
We add the following assumption.
- [S4]
*The kernel is symmetric, has bounded support and has finite second moment, and it holds . *
Other kernels can also be used but they will require a more complex estimator. Moreover, we introduce the following notation. For every kernel and , let
[TABLE]
Proposition 4.1**.**
Under Assumptions [M1]–[M3], [CLM1], [CLM2], and [S1]–[S4], for , , it holds
[TABLE]
in distribution, where
[TABLE]
Proof 4.2**.**
See Appendix A.3.
For the local linear case, we introduce the following quantities. For , set
[TABLE]
The first order condition for then reads
[TABLE]
Hence, the solution is given by
[TABLE]
where
[TABLE]
If is a second-order kernel, then
[TABLE]
so that can be interpreted as a second-order kernel with respect to the measure , which is defined via .
The local linear estimator in non-reversed time is defined as
[TABLE]
Proposition 4.3**.**
Under Assumptions [M1]–[M3], [CLM1], [CLM2], and [S1]–[S4], for , , it holds
[TABLE]
in distribution, where
[TABLE]
for .
Proof 4.4**.**
See Appendix A.4.
One alternative to estimate the cost-weighted density is to use a semiparametric asymmetric kernel density estimator which better accounts for the tail (Gustafsson et al., 2009). We chose not to do so in this paper, since a nonparametric estimation technique is more in the spirit of the chain-ladder technique as explained in the previous section.
5 Data Application: Estimating outstanding liabilities
We apply our estimator on a data set from a motor insurance in Cyprus which was collected between 2004 and 2013. The data contains 51,216 closed claims , consisting of their payment delay until the final payment , their accident dates , and the total claim amount . First, we estimate the marginal cost-weighted densities and of and , respectively, and in particular we forecast the outstanding claim amount consisting of all claims for accidents that have already incurred but have not been paid yet (see Remark 3 in Section 2). Afterwards, we illustrate our model assumptions [CLM1] and [CLM2] on the data set.
5.1 Estimation and forecasting
For the estimation of outstanding liabilities, we calculate the components and using the Epanechnikov kernel . For data-driven bandwidth selection, we use cross-validation (Rudemo, 1982; Hall, 1983; Bowman, 1984). The score function is motivated by the minimization problem which lead to the local polynomial estimators introduced in Chapter 4. For the estimation of we want to minimize in for . Since is unknown, we select the bandwidth as the minimizer of
[TABLE]
in instead (Nielsen et al., 2009). The “leave-one-out” terms are given as
[TABLE]
While this bandwidth selection works well for the estimators of the weighted density of the accident date , we get unrealistic estimates for . We decided to adjust the bandwidth manually and calculated for a small bandwidth days for delays shorter than years (= 548 days) and we used a large bandwidth days to estimate for days. The optimal bandwidths for by cross-validation are and days, respectively. We remark that a full investigation of local bandwidth selection is beyond the scope of this paper.
The results are given in Figure 2. Since most claims were paid off after 1.5 years, our density estimators for are almost zero for years. Big outliers in that area are oversmoothed, which reflects the possibility of large payments with high delays better than a small number of sharp local maxima of the density at the positions of the outliers and a density of 0 elsewhere.
For cost-weighted density estimators and , we estimate the reserve by
[TABLE]
The reserve estimate is motivated by the representation of the reserve in equation (3) in Remark 3. Estimates for outstanding claim payments per future year, per accident year, and in total are given in Table 1. We compare the estimators with local bandwidth correction with the results obtained through the classical chain-ladder method with quarterly aggregated data. Whereas all three total reserve forecasts are very similar, one can see differences for very short and very large delays. Furthermore, it is striking that both smoothed estimators forecast a non-zero claim amount for 2023 but chain-ladder estimates it to be 0. The difference between chain-ladder and smoothed density estimators for very short and mainly for large delays has already been observed for non-cost-weighted estimators in Hiabu et al. (2016). Moreover, as explained in Hiabu (2017), chain-ladder tends to overestimate the total reserve whereas the estimate from the local linear estimator is asymptotically unbiased. The local constant estimator is known to suffer from bias at boundaries, i.e., weaker performance than the local linear one for very short and very large delays (Fan & Gijbels, 1996; Wand & Jones, 1994). The undersmoothed densities estimators with bandwidths obtained from cross-validation yield similar estimates for the reserve although the shape of the density estimates is unrealistically rough for larger delays (13,030,459 in the local linear case and 13,268,768 in the local constant case).
We are aware that these forecasts are just point estimates for the reserve. We investigate variation in the forecast under a controlled setting in the simulation study in the next chapter.
5.2 Illustration of assumptions
For Assumption [CLM1], the independence between and could not be assured by an independence test based on Conditional Kendall’s tau for truncated data (Austin & Betensky, 2014; Martin & Betensky, 2005). To get more insight we aim to visualize the underlying dependency. We aggregate the data into three-month bins . Then we introduce a triangle with aggregated observations , , , and calculate the development factors , for development quarter and accident quarter date . The values of for the first six development quarters are given in Figure 3a. Under the assumption of independence between and (Assumption [CLM1]), the function is independent of the accident date and hence each plot should show points scattered around a horizontal line yielding a flat regression line.
The -values for the linear regression slope parameters were only significant at 5%-level in the first two quarters. For comparison, Figure 3b shows the development factors on independently simulated variables for which linear trends were insignificant at 10%-level in every quarter. Except for the first plot in Figure 3a, indeed none of the plots and linear fits from Figure 3a are visually distinguishable from any plot in Figure 3b. Clearly, this is not a sound method to prove independence but it illustrates that the dependence in our data might come from the first quarter only. Advisable extensions of our model, that handle possible dependence in our data, are e.g. seasonal effects as considered in Lee et al. (2015) or operational time (Lee et al., 2017). We do not consider these approaches in this paper.
For Assumption [CLM2], we have to verify that the expected cost conditioned on and is multiplicatively separable, i.e., that there exist functions , such that . Similarly to the above, we use a visual approach on aggregated data to illustrate this setting in the first six quarters of the accident years. Assumption [CLM2] is satisfied if for all observations with in quarter it holds for a quarter dependent constant , and a mean-zero error . Figure 4 shows the claim cost given the delay for claims in the first six quarters. Under Assumption [CLM2], the points in each plot should be generated by the same regression function after normalizing with the accident date quarter dependent factor . We use a linear interpolation to compare the structure of the observations. All but the third plot show very similar development of claim costs. In the third plot, the claim costs increase much faster due to some outliers that are not visible in the plot. For comparison, we generate 50,000 observations from Scenario 5 in Section 6 where with , .
We conclude that while the data does not fully follow our assumptions, it is suitable enough for the illustration purpose of this paper.
6 Simulation study
This chapter shows the performance of our new estimators on simulated data. Our first finding is that the local linear estimator outperforms the local constant one at boundaries (Section 6.1). Secondly, when estimating the reserve, our local constant estimator is best for small sample sizes and the local linear one for large sample sizes whereas the performance of chain-ladder varies (Section 6.2). Last, in a micro model in Section 6.3, we see that reliable monthly forecasts can only be obtained with our density estimators and not with chain-ladder.
6.1 Weighted density estimation
We perform a simulation study to show the performance of our estimators for a selection of distributions if the optimal bandwidth is chosen. We simulate truncated observations , on . With the true weighted densities and being known, we calculate the local constant and local linear estimators , , with the best bandwidth with respect to the integrated squared error
[TABLE]
for . We choose eight different settings for the distributions of , , and . The choice of the distributions is motivated by empirical distributions on the one hand and challenging estimation settings for the distribution of and are added on the other hand. The observations of and are simulated independently and truncated on . The probability density functions for and are shown in Figure 5a–d and the values of given one choice of are illustrated as histograms in Figure 5e and f. For simulated conditional claim costs given and , we take gamma distributions with shape parameter and different scale parameters and . Note that Assumption [CLM2] holds because of the identity of the gamma distribution.
We take all combinations of these distributions and label the eight scenarios as given in Table 2. For each scenario 1000 random samples of sizes 100, 1000, 10,000 and 100,000 are generated.
We investigate 32 cases arising from eight different scenarios and four different sample sizes. The exact results are omitted here. We only give our main conclusion and focus on the reserve estimate in more detail in the next section. The local constant and the local linear estimators perform similarly in terms of empirical mean integrated squared error (eMISE) with the local linear estimators being more stable. In 26 out of 32 cases, the eMISE of is more than 25% lower than the of . In 4 cases it even improves the eMISE by more than 75%. On the other hand, there are only two cases where the local linear estimator leads to an increase in the eMISE by 0.7% and 6.7%, respectively. In the other covariate, both density estimators perform equally well. However, the local linear estimator performs better in scenarios with the boundary challenge distribution for . This reflects aforementioned weakness of the local linear kernel density estimator close to boundary regions. The difference is biggest in Scenarios 7 and 8 where the local linear estimator is able to make up for the lack of observations in the corner.
6.2 Estimates for stimulated outstanding liabilities
Next, we compare the reserve estimates
[TABLE]
with the true outstanding claim amount defined in equation (3).
Table 3 contains the mean, standard deviation and the median of the errors in the estimation of the squared relative errors
[TABLE]
The results are compared to the estimation through chain-ladder applied on the triangle arising from the aggregation of the simulated observations of and into 20 bins each. This aggregation is comparable to quarterly aggregation on real data.
First, we want to note that there was a complete breakdown of the chain-ladder algorithm for too small numbers of observations which resulted in an invalid estimate in our implementation. Moreover, in most cases of the simulation study, our local polynomial density estimators outperform chain-ladder. The reserve estimates from the local linear estimators were strikingly better in the boundary challenge Scenarios 3, 4, 7 and 8 for numbers of observations larger than . For the local constant reserve estimate was best in six out of eight scenarios. In boundary challenge scenarios, chain-ladder not only lead to invalid results for small sample sizes but it also resulted in extreme outliers. An illustration of the results is given in Figure 6. It shows a scenario in which the local linear density estimator is the only one that estimates the altitude of the maximum in the joint density almost correctly.
We conclude that the local linear estimator performs best for and that the local constant one does for smaller sample sizes. Detailed results can be found in Table 3.
6.3 Simulation of a micro model
In this section, we investigate the performance of our estimators and chain-ladder on simulated data arising from a micro model. We simulate different steps in the underwriting and payment process separately and for different types of policies in one line of business and estimate outstanding payments under different circumstances. We then compare estimates of the reserve with actual future payments.
To create our data set, we follow the “central scenario” simulation in Baudry & Robert (2019). We generate mobile phone insurance policies that were underwritten over two years and estimate the outstanding liabilities at different times throughout the underwriting period and some months after the last policy was sold. We assume that the insurance provider covers damage in the three events breakage, oxidation, and theft. For this purpose the three policy types “breakage”, “breakage and oxidation”, and “breakage, oxidation, and theft” are underwritten with probabilities 0.25, 0.45, and 0.30. Moreover, there are four different mobile phone brands with four different models each specifying the price of the phone. The frequencies of brands and models and their basis prices and model prize factors are given in Table 4.
Following Baudry & Robert (2019), we simulate insurance policies that are underwritten independently between the first day of 2016 and the last day of 2017. The number of underwritten policies per day follows a Poisson point process with constant intensity , independently of policy type, phone brand, or model. Each policy covers exactly the period of the next 360 days after the underwriting day. For claims, we simulate the three incidents through a competing risk model with the constant hazards in Table 5. All events are recorded daily and we identify a year with the grid . After an incident has happened at time , the reporting time is generated from the reporting delay hazard
[TABLE]
for , . Hence, we assume a maximum reporting delay of one year. Denoting the reporting day by , the payment day is then generated from the payment delay hazard
[TABLE]
with , and where and . Hence, all claims are settled within 10 to 50 days. Note that both delays are independent of the incident or underwriting day and and, thus, Assumptions [M3] and [CLM1] are satisfied. Moreover, reporting delay and payment delay are both independent of phone brand, model, type of policy, and type of incident. Last, we assume that the whole claim is settled in a single payment (Assumption [M2]) which is a random proportion of the phone price following a beta distribution with parameters given in Table 5.
We estimate the outstanding payments at each month from September 2016 to May 2018 with our new estimators and with monthly aggregated chain-ladder and compare it with the simulated future payments. The whole scenario is then repeated 200 times. The average reserves over all 200 simulation runs and their empirical 95% confidence intervals are given in Figure 7a. It shows an increase in payments until February 2017, with new policies being underwritten every day. The payments stabilize afterwards in a balance between new policies and their claims and old policies expiring after 360 days. After December 2017, there is a decrease in payments since no new policies are underwritten anymore after 2017 and remaining policies expire. The medians of the actual outstanding future payments are taken over all 200 simulation runs and labeled as “true” reserves. Moreover, the mean squared error
[TABLE]
for reserve estimate and true reserve and simulation runs is given in Figure 7b. The reserve estimates from our local linear estimators have the lowest bias and variance. Whereas the local constant estimator suffers from bias, chain-ladder suffers heavily from variance (as found in Baudry & Robert (2019)). The latter is due to the monthly aggregation for chain-ladder which, however, is necessary to derive the monthly cash-flow. Our proposed kernel smoothers use larger bandwidths, and thus reduce variance, while still providing a monthly cash-flow.
To compare this setting with the previous section, the marginal cost-weighted distributions of the delay from incident day until payment day and of the incident day are given in Figure 8a and b, respectively. The real distribution of the data is approximated by the average over the empirical distributions from each of the 200 simulation runs. Figure 8 illustrates how the development factors of the chain-ladder method lead to a histogram instead of a smooth kernel estimator as described in Section 3.
Acknowledgments
The authors would like to thank the editors and two anonymous referees for useful comments and suggestions which helped to improve this research article.
Appendix A Proofs
In the proofs below we will use the symbols and which are the probabilistic counterparts to the Landau symbols and . A precise definition and explanation can be found in Appendix A of Pollard (2012). Further we will us the short-hand \Delta N^{R}_{i}(t)=\lim_{h\downarrow 0}N^{R}_{i}\mathopen{}\mathclose{{}\left\{(t+h)-}\right\}-N^{R}_{i}(t-).
A.1 Proof of Proposition 2.1
For the proof it suffices to show that
[TABLE]
and
[TABLE]
since , , , , are .
For the convergence of we note that and . Both statements follow from a strengthened Glivenko-Cantelli Theorem, since we have ; see (Van der Vaart, 2000, Chapter 19.1) for more details. Next we argue that (9) is equivalent to [CLM2]. We note that
[TABLE]
Now, since and are independent, we get
[TABLE]
Hence equation (9) is equivalent to
[TABLE]
With continuity arguments this holds if and only if is multiplicatively separable in and , i.e, [CLM2] holds.
A.2 Estimation of the weighted survival function
We begin by investigating the asymptotic behaviour of
[TABLE]
instead of . Later with Lemma A.5, we show that the difference between the two terms is uniformly of stochastic order and hence negligible. One can hence carry this result over to a result for the actual estimators and or , respectively.
To start with, we analyze the process , where the integral can be understood pathwise in Lebesgue-Stieltjes sense.
We start by deriving the compensator of :
[TABLE]
Note that the error comes from the Taylor expansion of at for . Since for all and , this error is also uniform of order . Hence,
[TABLE]
is asymptotically a compensator of the uniformly integrable submartingale . We denote the resulting process by which is, up to a lower order term of , a martingale. Since is cadlag with finite variation, the quadratic variation equals the sum of square differences:
[TABLE]
Note that we used , since is continuous. As [\widetilde{M}^{R}_{i}(t)]=\mathopen{}\mathclose{{}\left(\widehat{\widetilde{A}}{}^{R,*}_{i}}\right)^{2}, by similar arguments as before we can calculate its compensator to derive the predictable variation process
[TABLE]
Proposition 4.1 is based on the following intermediate result.
Lemma A.1**.**
Under Assumptions [M1]–[M3], [CLM1]–[CLM2] and S1–S4, it holds that
[TABLE]
in distribution in Skorokhod topology sense, where is a zero mean Gaussian martingale with covariance, .
Proof A.2**.**
This follows from a martingale central limit theorem in Rebolledo (1980) as illustrated in Andersen et al. (1993, p. 83). For the assumptions to be satisfied, we verify that
[TABLE]
where we have used that for . For the Lindeberg condition we observe
[TABLE]
where we used that the jumps happen at the same time with zero probability. The condition follows from the terms in the sum being , since .
Corollary A.3**.**
Under Assumptions [M1]–[M3], [CLM1]–[CLM2] and S1–S4, it holds that
[TABLE]
Proof A.4**.**
This follows from Lemma A.1 with Lenglart’s inequality and the functional delta method, since and are functionals of and , respectively.
Indeed, it holds and . The conclusion from Lenglart’s inequality (see Andersen et al. (1993)) is
[TABLE]
for every and every . Hence, the fact that from the proof of Lemma A.1 implies . Therefore, we get (Andersen et al., 1993, p. 86) and the conclusion follows.
A.3 Proof Proposition 4.1
We first split the estimation error into a stable part and a martingale part via
[TABLE]
where
[TABLE]
We now discuss the asymptotics of and separately, starting with . Defining
[TABLE]
leads to
[TABLE]
where the error is again uniformly bounded (see Proposition 2.1). Hence, since is a martingale, is, up to lower order terms, a martingale as well. With similar arguments as before we get
[TABLE]
Similarly as in the proof of Lemma A.1, we use the martingale central limit theorem from Rebolledo (1980) to show
[TABLE]
in distribution in Skorohod topology sense, where is a zero mean Gaussian martingale with covariance . The assumptions are satisfied by similar arguments as in Lemma A.1. We only illustrate the derivation of . It holds
[TABLE]
With the uniform convergences of , and , we conclude that
[TABLE]
for , which coincides with .
We continue with the asymptotics for . After expanding and replacing by , which we can do with Corollary A.3, we have that
[TABLE]
From the remark in A.1 we can further use that converges uniformly to and it even holds that . Hence,
[TABLE]
Note that an error term of order is likewise of order by Assumption S1. The proof is concluded by two Taylor expansions in the numerator and one in the denominator and using that is a second order kernel.
The implication for the estimator built from instead of follows with Lemma A.5.
Lemma A.5**.**
It holds .
Proof A.6**.**
The proof follows from the fact that and , together with an analogous argumentation as in the last proof.
First, because of the non-negativity of and the boundedness of , it holds
[TABLE]
for , with the notation from Section A.2. From here, a completely analogous argumentation with a martingale \widetilde{M}^{R,*}_{i}(t)=\Big{(}\widehat{\widetilde{A}}{}^{R,*}_{i}\Big{)}^{2}-\Lambda_{i}^{R,*}(t) for the compensator
[TABLE]
leads to the central limit theorem
[TABLE]
The same argument with Lenglart’s inequality as in the proof of Corollary A.3 yields the conclusion.
A.4 Proof of Proposition 4.3
We first introduce the notation
[TABLE]
for every kernel and remind of the notation
[TABLE]
Since
[TABLE]
one can easily verify that converges locally uniform almost surely to , where arises from by replacing and with the local versions (Nielsen & Tanggaard, 2001). Furthermore, if is symmetric, then .
From equation (11), Assumption S3 and Corollary A.3, we conclude that it is enough to consider the asymptotic behaviour of
[TABLE]
Analogously to the local constant case, we split the estimation error into a stable and a martingale part
[TABLE]
where
[TABLE]
The asymptotic limit of the bias part, , is now easily derived via a second order Taylor expansion. The martingale part can be concluded with similar arguments as in Appendix A.2.
A.5 Proof of Proposition 3.1
The proof follows along the same lines as the proof of Proposition 4.1 above, just simpler, with the choice
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Aalen (1978) Aalen, O. O. (1978), ‘Non-parametric inference for a family of counting processes’, The Annals of Statistics 6 , 701–726.
- 3Andersen et al. (1993) Andersen, P., Borgan, O., Gill, R. & Keiding, N. (1993), Statistical Models Based on Counting Processes , Springer, New York.
- 4Antonio & Plat (2014) Antonio, K. & Plat, R. (2014), ‘Micro-level stochastic loss reserving for general insurance’, Scandinavian Actuarial Journal 2014 , 649–669.
- 5Arjas (1989) Arjas, E. (1989), ‘The claims reserving problem in non-life insurance: Some structural ideas’, ASTIN Bulletin 19 , 139–152.
- 6Austin & Betensky (2014) Austin, M. D. & Betensky, R. A. (2014), ‘Eliminating bias due to censoring in kendall’s tau estimators for quasi-independence of truncation and failure’, Computational Statistics & Data Analysis 73 , 16–26.
- 7Avanzi et al. (2016) Avanzi, B., Wong, B. & Yang, X. (2016), ‘A micro-level claim count model with overdispersion and reporting delays’, Insurance: Mathematics and Economics 71 , 1–14.
- 8Badescu et al. (2016) Badescu, A. L., Lin, X. S. & Tang, D. (2016), ‘A marked Cox model for the number of IBNR claims: Theory’, Insurance: Mathematics and Economics 69 , 29–37.
