Rank-Based Causal Discovery for Post-Nonlinear Models
Grigor Keropyan, David Strieder, Mathias Drton

TL;DR
This paper introduces a novel rank-based method for discovering causal relationships in post-nonlinear models, improving robustness and consistency over existing residual dependency approaches.
Contribution
The paper proposes a new rank-based approach for PNL causal discovery that leverages model invariances and separates function estimation from independence testing.
Findings
Method is consistent in theory.
Performs well in numerical experiments.
Reduces overfitting compared to residual dependency methods.
Abstract
Learning causal relationships from empirical observations is a central task in scientific research. A common method is to employ structural causal models that postulate noisy functional relations among a set of interacting variables. To ensure unique identifiability of causal directions, researchers consider restricted subclasses of structural causal models. Post-nonlinear (PNL) causal models constitute one of the most flexible options for such restricted subclasses, containing in particular the popular additive noise models as a further subclass. However, learning PNL models is not well studied beyond the bivariate case. The existing methods learn non-linear functional relations by minimizing residual dependencies and subsequently test independence from residuals to determine causal orientations. However, these methods can be prone to overfitting and, thus, difficult to tune…
| - | Gaussian noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 2.51 1.26 | 3.94 1.09 | 2.4 1.16 |
| 500 | 1.86 1.31 | 3.46 1.08 | 2.41 1.23 |
| 1000 | 1.61 1.15 | 3.14 1.14 | 2.43 1.16 |
| 1500 | 1.66 1.35 | 3.28 1.18 | 2.62 1.36 |
| 2000 | 1.63 1.17 | 3.23 1.12 | 3.15 1.51 |
| - | Gumbel noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 2.86 1.36 | 3.98 1.36 | 2.3 1.18 |
| 500 | 2.34 1.45 | 3.07 1.24 | 2.49 1.2 |
| 1000 | 1.8 1.28 | 3.31 1.11 | 2.43 1.28 |
| 1500 | 1.68 1.1 | 3.06 1.03 | 3.04 1.31 |
| 2000 | 1.37 1.14 | 3.17 1.18 | 3.02 1.41 |
| - | Logistic noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 2.61 1.35 | 3.89 1.11 | 2.31 1.1 |
| 500 | 2.06 1.34 | 3.32 1.24 | 2.36 1.13 |
| 1000 | 1.85 1.37 | 3.41 1.1 | 2.66 1.24 |
| 1500 | 1.68 1.38 | 3.27 1.2 | 2.92 1.32 |
| 2000 | 1.79 1.18 | 3.29 1.16 | 3.44 1.38 |
| - | Gaussian noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 1.21 1.23 | 3.7 1.28 | 2.72 1.2 |
| 500 | 0.21 0.56 | 2.8 1.01 | 2.69 1.12 |
| 1000 | 0.15 0.59 | 3.18 1.08 | 2.73 1.08 |
| 1500 | 0.04 0.24 | 3.06 1.08 | 2.9 1.06 |
| 2000 | 0.05 0.3 | 3.26 1.17 | 3.1 1.29 |
| - | Gumbel noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 1.44 1.26 | 3.47 1.13 | 2.86 1.26 |
| 500 | 0.42 0.91 | 2.61 1.1 | 2.62 1.19 |
| 1000 | 0.13 0.39 | 2.88 1.15 | 2.75 1.15 |
| 1500 | 0.15 0.54 | 2.76 1.14 | 2.98 1.2 |
| 2000 | 0.08 0.37 | 2.83 1.21 | 3.29 1.26 |
| - | Logistic noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 1.05 1.3 | 3.82 1.15 | 2.5 1.24 |
| 500 | 0.39 0.79 | 2.84 1.02 | 2.46 1 |
| 1000 | 0.14 0.47 | 3.01 1.14 | 2.53 1.11 |
| 1500 | 0 0 | 2.95 1.18 | 2.98 1.05 |
| 2000 | 0 0 | 2.69 1.17 | 3.28 1.21 |
| - | Gaussian noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 4.65 2.21 | 6.72 2.13 | 3.89 1.85 |
| 500 | 3.75 2.11 | 5.4 1.88 | 4.1 1.76 |
| 1000 | 3.29 2.03 | 5.46 1.96 | 4.27 1.95 |
| 1500 | 2.94 2.13 | 5.4 1.87 | 4.34 1.88 |
| 2000 | 3.35 2.09 | 5.49 1.84 | 5.02 2.09 |
| - | Gumbel noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 4.74 2.12 | 6.33 2.26 | 3.95 2.09 |
| 500 | 3.68 1.97 | 5.41 1.83 | 3.95 1.75 |
| 1000 | 3.4 1.98 | 5.39 1.92 | 3.78 1.92 |
| 1500 | 3.33 2.05 | 5.96 2.12 | 4.66 1.99 |
| 2000 | 3.47 2.61 | 5.26 1.7 | 5.07 2.41 |
| - | Logistic noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 4.56 2.1 | 6.45 2.08 | 4.01 1.96 |
| 500 | 3.65 2.13 | 5.52 1.49 | 3.86 1.85 |
| 1000 | 3.88 2.42 | 5.35 1.96 | 4.18 1.88 |
| 1500 | 2.93 1.98 | 5.37 1.8 | 4.42 2.01 |
| 2000 | 3.39 1.97 | 5.1 1.64 | 4.96 2.17 |
| - | Gaussian noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 2.44 1.69 | 6.71 2.47 | 4.15 1.75 |
| 500 | 1.11 1.46 | 5.06 1.75 | 4.4 1.84 |
| 1000 | 0.28 0.75 | 5.08 1.87 | 4.47 1.88 |
| 1500 | 0.19 0.6 | 5.18 2 | 4.35 1.79 |
| 2000 | 0.19 0.69 | 5.26 1.55 | 4.69 2 |
| - | Gumbel noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 3.13 2.02 | 6.11 2.22 | 4.17 1.76 |
| 500 | 0.88 1.4 | 4.32 1.65 | 4.41 1.74 |
| 1000 | 0.38 0.93 | 4.7 1.87 | 4.22 1.97 |
| 1500 | 0.37 0.97 | 4.88 1.78 | 4.88 1.87 |
| 2000 | 0.14 0.55 | 4.81 1.94 | 5.02 2.06 |
| - | Logistic noise | ||
|---|---|---|---|
| - | RankG | AbPNL | RESIT |
| 100 | 2.65 2 | 6.83 2.2 | 4.03 1.54 |
| 500 | 0.68 1.09 | 4.87 1.82 | 4.12 1.77 |
| 1000 | 0.45 1.01 | 5.08 1.79 | 4.27 1.8 |
| 1500 | 0.34 0.98 | 4.89 1.87 | 4.55 1.76 |
| 2000 | 0.2 0.68 | 4.43 1.69 | 4.56 1.81 |
| - | Gaussian noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 2.23 1.42 | 2.51 1.26 | 3.94 1.09 | 2.4 1.16 |
| 150 | 1.73 1.49 | 2.75 1.25 | 2.6 1.2 | 2.47 1.13 |
| 200 | 1.63 1.34 | 2.24 1.32 | 2.87 1.19 | 2.3 1.02 |
| 250 | 1.56 1.27 | 2.3 1.16 | 3.07 1.15 | 2.4 1.22 |
| 300 | 1.49 1.29 | 2.24 1.35 | 2.97 1.02 | 2.54 1.1 |
| - | Gumbel noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 1.69 1.36 | 2.86 1.36 | 3.98 1.36 | 2.3 1.18 |
| 150 | 1.76 1.28 | 2.56 1.4 | 2.88 1.34 | 2.37 1.1 |
| 200 | 1.77 1.38 | 2.78 1.3 | 2.77 1.17 | 2.38 1.25 |
| 250 | 1.73 1.27 | 2.44 1.18 | 3.04 1.13 | 2.45 1.16 |
| 300 | 1.84 1.23 | 2.4 1.43 | 3.11 1.12 | 2.35 1.1 |
| - | Logistic noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 2.02 1.32 | 2.61 1.35 | 3.89 1.11 | 2.31 1.1 |
| 150 | 1.8 1.26 | 2.58 1.39 | 2.84 1.26 | 2.7 1.23 |
| 200 | 1.63 1.28 | 2.5 1.4 | 3.17 1.25 | 2.55 1.25 |
| 250 | 1.45 1.23 | 2.21 1.3 | 3.32 1.23 | 2.3 1.25 |
| 300 | 1.61 1.32 | 2.3 1.23 | 3 1.04 | 2.4 1.17 |
| - | Gaussian noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 1.72 1.45 | 1.21 1.23 | 3.7 1.28 | 2.72 1.2 |
| 150 | 1.54 1.36 | 1.11 1.48 | 2.3 1.41 | 2.69 1.2 |
| 200 | 1.37 1.37 | 0.71 1.09 | 2.52 1.29 | 2.72 1.05 |
| 250 | 1.37 1.28 | 0.77 1.14 | 2.55 1.07 | 2.78 1.22 |
| 300 | 1.21 1.27 | 0.48 0.89 | 3.1 1.28 | 2.57 1.17 |
| - | Gumbel noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 1.73 1.43 | 1.44 1.26 | 3.47 1.13 | 2.86 1.26 |
| 150 | 1.63 1.37 | 1.22 1.33 | 2.04 1.19 | 2.64 1.28 |
| 200 | 1.41 1.31 | 0.81 1.02 | 2.5 1.08 | 2.7 1.32 |
| 250 | 1.48 1.26 | 0.62 1.08 | 2.38 1.25 | 2.58 1.24 |
| 300 | 1.72 1.42 | 0.69 1.05 | 2.44 1.19 | 2.76 1.12 |
| - | Logistic noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 1.78 1.4 | 1.05 1.3 | 3.82 1.15 | 2.5 1.24 |
| 150 | 1.44 1.34 | 0.74 1.02 | 2.86 1.25 | 2.56 1.16 |
| 200 | 1.55 1.27 | 0.57 1.02 | 3.18 1.24 | 2.37 1.17 |
| 250 | 1.3 1.14 | 0.36 0.72 | 3.14 1.25 | 2.28 1.09 |
| 300 | 1.38 1.43 | 0.33 0.75 | 3.23 1.15 | 2.51 1.11 |
| - | Gaussian noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 1.24 0.92 | 3.33 1.41 | 3.7 1.32 | 2.75 1.31 |
| 150 | 1.42 1.07 | 2.96 1.5 | 3.1 1.49 | 2.56 1.21 |
| 200 | 1.37 1.1 | 3.26 1.54 | 3.37 1.34 | 2.42 1.32 |
| 250 | 1.78 1.18 | 2.8 1.55 | 3.35 1.23 | 2.33 1.12 |
| 300 | 1.63 1 | 3.32 1.59 | 3.52 1.23 | 2.64 1.31 |
| - | Gumbel noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 1.11 0.92 | 3 1.37 | 3.33 1.36 | 2.59 1.23 |
| 150 | 1.51 1.01 | 3.14 1.37 | 2.7 1.27 | 2.78 1.2 |
| 200 | 1.56 0.96 | 3.29 1.41 | 3.22 1.22 | 2.81 1.26 |
| 250 | 1.63 0.93 | 3.52 1.41 | 3.34 1.08 | 2.94 1.29 |
| 300 | 1.86 0.98 | 3.41 1.44 | 3.61 1.15 | 2.73 1.25 |
| - | Logistic noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 2.02 1.32 | 2.61 1.35 | 3.52 1.24 | 2.31 1.1 |
| 150 | 1.8 1.26 | 2.58 1.39 | 3.23 1.18 | 2.7 1.23 |
| 200 | 1.63 1.28 | 2.5 1.4 | 3.47 1.09 | 2.55 1.25 |
| 250 | 1.45 1.23 | 2.21 1.3 | 3.35 1.07 | 2.3 1.25 |
| 300 | 1.61 1.32 | 2.3 1.23 | 3.45 1.41 | 2.4 1.17 |
| - | Gaussian noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 2.01 1.38 | 2.92 1.66 | 3.64 1.28 | 2.63 1.17 |
| 150 | 2.34 1.29 | 2.75 1.72 | 2.64 1.43 | 2.8 1.29 |
| 200 | 2.44 1.42 | 2.7 1.62 | 3.19 1.35 | 2.81 1.32 |
| 250 | 2.53 1.49 | 2.42 1.65 | 3.57 1.27 | 2.63 1.24 |
| 300 | 2.48 1.57 | 2.79 1.8 | 3.7 1.2 | 2.58 1.34 |
| - | Gumbel noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 1.85 1.17 | 2.99 1.49 | 3.55 1.13 | 2.83 1.23 |
| 150 | 2.15 1.21 | 2.68 1.56 | 2.54 1.43 | 2.84 1.39 |
| 200 | 2.43 1.37 | 2.69 1.59 | 2.71 1.27 | 2.87 1.3 |
| 250 | 2.51 1.33 | 2.9 1.61 | 3.2 1.26 | 3.05 1.45 |
| 300 | 2.32 1.35 | 3.03 1.38 | 3.1 1.18 | 2.94 1.32 |
| - | Logistic noise | |||
|---|---|---|---|---|
| - | RankS | RankG | AbPNL | RESIT |
| 100 | 2.12 1.17 | 2.97 1.45 | 3.49 1.36 | 2.61 1.38 |
| 150 | 2.29 1.39 | 2.93 1.42 | 3.09 1.39 | 2.63 1.19 |
| 200 | 2.4 1.36 | 2.91 1.54 | 3.5 1.17 | 2.45 1.2 |
| 250 | 2.34 1.34 | 2.81 1.5 | 3.34 1.23 | 2.53 1.15 |
| 300 | 2.48 1.44 | 2.76 1.72 | 3.29 1.19 | 2.59 1.24 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference
MethodsTest
Rank-Based Causal Discovery for Post-Nonlinear Models
Grigor Keropyan
David Strieder
Mathias Drton
Technical University of Munich
Technical University of Munich
Munich Center for Machine Learning
Technical University of Munich
Munich Center for Machine Learning
Abstract
Learning causal relationships from empirical observations is a central task in scientific research. A common method is to employ structural causal models that postulate noisy functional relations among a set of interacting variables. To ensure unique identifiability of causal directions, researchers consider restricted subclasses of structural causal models. Post-nonlinear (PNL) causal models constitute one of the most flexible options for such restricted subclasses, containing in particular the popular additive noise models as a further subclass. However, learning PNL models is not well studied beyond the bivariate case. The existing methods learn non-linear functional relations by minimizing residual dependencies and subsequently test independence from residuals to determine causal orientations. However, these methods can be prone to overfitting and, thus, difficult to tune appropriately in practice. As an alternative, we propose a new approach for PNL causal discovery that uses rank-based methods to estimate the functional parameters. This new approach exploits natural invariances of PNL models and disentangles the estimation of the non-linear functions from the independence tests used to find causal orientations. We prove consistency of our method and validate our results in numerical experiments.
1 INTRODUCTION
Discovering the causal structure of complex systems is an important question in various disciplines such as biology, economics, clinical medicine, or neuroscience (Opgen-Rhein and Strimmer,, 2007; Glymour et al.,, 2019; Moneta et al.,, 2013). The gold standard approach to exploration of causal relations is to perform controlled experiments in which researchers externally intervene in the system and observe the resulting changes to variables of interest. However, in many applications controlled experiments are not feasible due to high cost or for ethical reasons. In such cases, causal discovery based on only observational data can be a useful tool (Spirtes and Zhang,, 2016).
A common tool for modeling causal relations are Structural Equation Models (SEMs). In their general form, SEMs postulate noisy functional relationships between a set of interacting variables. In the fully general setting, causal discovery methods such as constraint-based and score-based methods can identify the underlying causal structure only up to Markov equivalence classes (Spirtes et al.,, 2000). Thus, the literature has also considered many restricted subclasses that enable unique identification (Drton et al.,, 2011; Hoyer et al.,, 2008; Peters et al.,, 2014; Zhang and Hyvärinen,, 2009). In this realm, post-nonlinear (PNL) causal models constitute one of the most general approaches. They are identifiable from the joint distribution under mild assumptions (Zhang and Hyvärinen,, 2009; Peters et al.,, 2014) and yet offer a rather flexible framework for modeling complex non-linear causal systems.
Existing methods for bivariate PNL causal discovery are based on estimating the functional relations by minimizing independence criteria (HSIC, mutual information, etc.) between the noise and potential parents in a first step, and performing independence tests to determine the causal structure in a second step (Zhang and Hyvärinen,, 2009; Uemura and Shimizu,, 2020). However, minimizing dependence to subsequently test for independence leads to potential overfitting and thus limits the PNL approach. Another approach considered by Tu et al., (2022) employs Optimal Transport theory for bivariate post-nonlinear and additive noise causal discovery, but it is not evident how to generalize their method to multivariate models. To our knowledge the only work that deals with multivariate PNL models is Uemura et al., (2022), where the authors generalize the bivariate method from Uemura and Shimizu, (2020) based on minimizing dependence and subsequently testing for independence.
In this article we present a new method for multivariate causal discovery in PNL models that disentangles the two tasks of learning the functional relations and learning the causal structure. Our method continues to learn the latter with the help of independence tests, but it employs rank-based methods to learn the functional relations and, in this way, avoids overfitting issues.
The remainder of the paper is organized as follows. In Section 2 we introduce the PNL rank regression methods that we use to learn the functional relations. We study the special case of linearity in the inner function and show consistency of our proposed rank-based functional parameter estimates. This special case includes general nonlinear functions using basis expansions. In Section 3 we discuss the causal order learning routine using our proposed rank-based estimates in a recursive process that finds sink nodes by independence testing. Furthermore, we show consistency of the causal order estimation for PNL models and present the results of a simulation study in Section 4, where we compare our method to existing causal learning methods. Section 5 concludes the paper.
2 PNL RANK REGRESSION
In this section we introduce the rank-based estimators that we use in the first step of our proposed causal learning algorithm. The goal is to employ these rank-based estimators to infer the functional relations among the variables and to obtain estimates of the stochastic noise terms in the model. In the second step, we use the estimated noise terms to test for independence.
Suppose we observe a sample of independent copies of a random vector . We assume the data generating process follows a PNL model, that is, the response variable is given by
[TABLE]
where is a continuous random vector and the stochastic error term has mean zero with unknown continuous distribution, independent of . Furthermore, we assume that the function is continuous and strictly increasing (thus, invertible) whereas may be an arbitrary function.
Under similar assumptions, Zhang and Hyvärinen, (2009) suggested to estimate the noise by representing and with Multi-layer Perceptrons (MLPs) and minimizing mutual information with via gradient-based methods. However, the main drawback of their methodology is that this model can fit perfectly to any data by learning constant functions and . The estimated noise will be constant and thus always independent from .
To overcome this problem Uemura and Shimizu, (2020) implemented an additional auto-encoding structure in the minimization problem that enforces invertibility of the function . While this circumvents the problem of constant estimation of the function , there are further challenges that arise from minimizing dependence and subsequent testing for independence. Indeed, the complexity of the function class assumed for needs to be balanced very carefully with the available sample size. Otherwise, can be fitted perfectly such that , in which case the functional estimates cancel and the estimated noise is always independent of . Uemura and Shimizu, (2020) used a fixed architecture for the function classes of and . As a result, especially for small sample sizes (compared to the complexity of the function class of ), their method is prone to overfitting and canceling the effect of the function . Such overfitting may then entail erroneous results in independence tests for causal structure learning.
To avoid the noted overfitting issues, we propose the following two-stage method to learn the functional relations. In the first stage, we leverage rank statistics to separately estimate the function , without any appeal to measures of dependence between the noise and the predictor . The strictly increasing function preserves the ranks of and thus, using rank-based methods, we can avoid estimating at this stage. This circumvents the problem of matching . In the second step, we estimate the functional relation at all observed data points to obtain the required estimates of the noise.
In order to simplify the concept and a theoretical analysis of our proposed method, we assume in the following linearity of the function , i.e. for . This can also be seen as a first order Taylor approximation of an arbitrary functional relation.
Remark 2.1**.**
Our framework and the idea of disentangling learning and testing by employing rank-based objective functions can be easily extended to the nonlinear case. By employing basis expansions, MLPs or any parametric function class in combination with the proposed rank-based scores to learn the functional relations one can trace the steps of the presented linear case. For instance, consider the basis functions , where sufficiently slowly, similar to Bühlmann et al., (2014). Then we can represent the (nonlinear) function by , where for all , and employ our proposed framework. The simulations in Section 4 include an example.
We start by studying the special case of model (1) under the assumption of Gaussian noise and derive a computationally efficient algorithm for estimating the functional parameters. Further, in Subsection 2.2 we consider the general case without restricting the noise distribution. The main idea of our approach is to leverage rank likelihoods, however, using the full marginal rank likelihood is not computationally tractable. A common approach to circumvent calculating the full marginal rank likelihood is to employ approximate Monte Carlo methods, e.g., considered by Doksum, (1987). However, we observed that this approach does not work well in practice for values of larger than one. Thus, in our proposed framework we employ pairwise rank likelihoods to approximate the full marginal rank likelihood (in the Gaussian case) or the rank correlation function (in the general noise case).
2.1 Gaussian Case
We assume that the data generating process follows the model
[TABLE]
with some unknown . Furthermore, we assume that the noise is standard normal distributed and propose the following computationally fast algorithm to estimate the functional relations. The idea of this method is based on Yu et al., (2021).
We exploit the fact that is a strictly increasing function and therefore preserves the ranks of . The normality assumption yields and we obtain
[TABLE]
where is the cumulative distribution function of the standard normal distribution. The normalized log pairwise rank likelihood function is then given by
[TABLE]
We estimate by maximizing , that is,
[TABLE]
This defines a concave optimization problem, which leads to a computationally fast estimation routine for without precise knowledge or estimation of the function .
Proposition 2.1**.**
The log pairwise rank likelihood function defined in (2.1) is concave. Moreover, if we assume that , then is strictly concave.
The proof can be found in Appendix B.
Furthermore, the proposed estimator is consistent.
Theorem 2.1**.**
As (and in particular ), it holds that
[TABLE]
For a proof we refer the reader to Appendix C.
In order to obtain an estimate of the noise, we estimate the transformation function in a second step. We employ the following computationally fast and consistent estimation routine proposed by Cuzick, (1988). In his proposal, Cuzick, (1988) considers non-random covariates, however, the results are applicable to our setup conditional on the observed data The method exploits the normality assumption as well as knowledge of the ranks of and thus of .
Let be the adjusted empirical distribution function of , that is
[TABLE]
Remark 2.2**.**
Note that we only require the ranks of to obtain the estimate . Since is a strictly increasing function, the ranks of are given by the ranks of .
We denote the cumulative distribution function of a randomly chosen by , that is
[TABLE]
Then an estimator for the functional relation at the sample points is given by
[TABLE]
By extending this estimator to a step function on , and under some additional assumptions, Cuzick, (1988) show that this estimator converges to almost surely at all continuity points of the function .
Remark 2.3**.**
In our setup, we assume is continuous. Furthermore, the additional assumptions that ensure consistency in the setting from Cuzick, (1988) are mainly smoothness and moment conditions on the distributions of and . In the considered Gaussian setting most of them are already satisfied. For a detailed list of the assumptions, we refer the reader to Appendix A.
2.2 General Case
The problem of estimating the parameter without additional assumptions on the distribution of the noise in model (2) is extensively studied in the literature, see i.e. Doksum, (1987); Han, (1987); Sherman, (1993); Abrevaya, 1999a ; Abrevaya, 1999b ; Abrevaya, (2003); Cavanagh and Sherman, (1998); Zhang, (2013). Without any restriction on expectation or variance of the noise the function is not unique, since it can be replaced by location or scale transformations. Thus, to ensure unique identification, we assume that there exists a known such that and we scale the last element of to , that is, .
To simplify the optimization, we employ a method introduced by Lin and Peng, (2013) that utilizes the rank-based objective function
[TABLE]
Remark 2.4**.**
Lin and Peng, (2013)** used the assumption to ensure unique identification, which is equivalent to our assumption up to rescaling.
To obtain a sparse solution the authors focused on a penalized version of . However, as we are only interested in estimating the residuals and do not necessarily need sparsity, we adapted their analysis for the following simplified non-penalized estimator , where
[TABLE]
by setting their penalty term to zero.
Under some additional assumptions that mainly ensure smoothness of the distributions of and , Lin and Peng, (2013) prove the existence of a local maximizer of with
[TABLE]
and, thus, defines a consistent estimator for . This estimator uses only the rank information without requiring concrete knowledge of . A detailed list of the additional assumptions that ensure consistency can be found in Appendix A.
In a second step we estimate the function in order to subsequently obtain an estimate of the noise. We used the method introduced in Chen, (2002) based on the rank correlation. To simplify the complex optimization of discrete objective functions, we employ a smoothed version introduced by Zhang, (2013).
The smoothed rank correlation objective function is defined by
[TABLE]
where and . Then we define an estimator of the function at via
[TABLE]
where is an appropriate compact set.
In Theorem 4.1, Zhang, (2013) establish consistency of the proposed estimator for under a few assumptions, that include -consistency of the involved estimator and strict monotonicity of the function , as well as some additional regularity assumptions. A detailed list of the additional assumptions can be found in Appendix A.
Without restricting the optimization space by the problem (6) is ill-posed in the sense that for and for . To circumvent the issue of choosing a proper compact set , we added an regularization term and optimized over , that is,
[TABLE]
In the experiments we used the regularization parameter , which turned out to be small enough to not affect the estimated values significantly and at the same time bounded the objective function for the observed extremes.
3 LEARNING PNL MODELS
By combining the previously introduced rank-based estimators of the functional relations in PNL models we obtain estimates of the stochastic error terms. Using these rank-based estimated error terms, we propose a routine to learn the underlying causal structure by recursively identifying sink nodes via independence testing. Further, we show that our proposed routine consistently recovers a valid causal ordering under identifiability assumptions.
Suppose we observe data in form of independent copies from a random vector . We assume that follows a PNL causal model, that is, the data generating process is defined by the structural equations
[TABLE]
where , called the parents of , are a subset of . The causal perspective stems from viewing those equations as making assignments. Each variable on the left-hand side is assigned the value specified on the right-hand side, given by the value of its parents and a stochastic error term. The causal structure inherent in such structural equations is naturally represented by a directed graph , where edges indicate which other variables each variable causally depends upon. As in related work, we assume the corresponding directed graph to be acyclic (DAG). The noise variables are assumed to be mutually independent and \varepsilon^{(k)}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 3.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 3.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 3.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 3.0mu{\scriptscriptstyle\perp}}}X^{(\textbf{PA}_{k})} for each . The main ansatz for inferring the causal structure is to leverage the independence structure of the stochastic noise and a correctly specified parent set, that is, is independent of all that precede in at least one true causal ordering of the underlying graph.
We focus on inferring the causal ordering to reduce the computational burden, however, the framework can easily be adapted to infer the specific causal graph structure by pruning redundant edges. The causal ordering of a graph is given by a permutation of , such that, if there exists a directed edge from node to node in the graph then . We emphasize that the causal ordering for a given graph is not necessarily unique but each causal ordering corresponds to a unique, fully connected DAG , where has a directed edge from node to node if and only if . Thus, similar to Bühlmann et al., (2014), we can define the set of true causal orderings for any DAG as the set of all causal orderings that correspond to fully connected DAGs which contain as a sub-graph, that is
[TABLE]
Remark 3.1**.**
In general contains more than one element and all elements correspond to valid causal orderings of the DAG (e.g. in the extreme case of an empty graph, all permutations are true causal orderings).
In order to apply the previously introduced rank-based regression methods, we assume that all functions in the data generating PNL causal model are continuous and strictly increasing, and thus, we can define their inverse via . Further, we assume that all are linear and the distribution of every stochastic error is assumed to be continuous. We emphasize again, our method is applicable to nonlinear functional relations by means of basis expansions.
Put together, each structural equation, i.e. each cause-effect relation, corresponds to a PNL regression model (1) as introduced in the previous section. That is, the data generating process follows
[TABLE]
where the non-descendants are given by all nodes that precede node in at least one true causal ordering and is independent of . Note that the entries in which do not correspond to parents of node are simply zero. To leverage the independence \varepsilon^{(k)}\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 3.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 3.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 3.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 3.0mu{\scriptscriptstyle\perp}}}X^{(\textbf{ND}_{k})}, we define
[TABLE]
For every sink node in the graph, we have , and, thus, for every sink node the noise is independent of . Moreover, if node is not a sink node in the graph, then the noise is not independent of , since contains at least one child of . Thus, we can recursively identify a sink node using the HSIC (Gretton et al.,, 2005) measure of independence between the estimated noise and the remaining nodes .
We propose the following routine to learn one of the valid causal orderings of the graph. First, we utilize the rank-based estimators and , introduced in the previous section, and estimate the noise via
[TABLE]
We repeat this noise estimation for all remaining nodes and subsequently calculate the HSIC test statistic between the estimated noises and the observed values of the remaining nodes , that is
[TABLE]
We determine the node which leads to the minimal test statistic as a sink node, that is, our proposed sink node estimator is defined by
[TABLE]
In the next step we remove from the set and repeat the sink node identification procedure to estimate . Thus, recursively we obtain an estimate for the causal ordering
[TABLE]
In the following Theorem we prove consistency of our proposed estimation routine for the causal ordering. It is clear that if at any step our method fails to correctly identify a remaining sink node, then it fails to estimate a valid causal ordering. Thus, we must require sink node identifiability from the joint distribution in order to ensure consistency of the estimated causal order. The following assumption (A) formalizes this intuition.
(A) For each and that contains at least one child of as well as for all strictly increasing, continuous functions and for all , there exists a constant , such that
[TABLE]
where
[TABLE]
Theorem 3.1**.**
Under assumption (A) and consistency of the employed estimators and we have
[TABLE]
The proof can be found in Appendix D.
Remark 3.2**.**
Location and scale transformations of the noise variables can be matched by transformations in the functions , however, these transformations do not change the dependence structure and thus without loss of generality, we can assume that the location and scale assumptions in Section 2 are satisfied.
4 SIMULATIONS
In this section we present the results of a simulation study with the aim to compare the performance of our algorithm to existing causal learning methods on various simulated data sets and validate the consistency results experimentally. If necessary in the specific application, our method can be easily extended to infer the full causal graph structure, however, this vastly increases the computation time, similar to the related existing methods. Since the main differences of the algorithms already come into play during the causal order estimation procedure, we compared the performance on this task alone. Our experiments were designed as follows. We randomly sampled a causal structure with ( and ) nodes from Erdős–Rényi directed acyclic graphs with edge probability . Thus, the expected total number of edges in the graph is . We generated data according to the corresponding post-nonlinear model using the following structural relations
[TABLE]
for with different noise distributions (standard Normal , standard Gumbel or standard Logistic ). Further, and are sampled from a uniform distribution with either a low range from to , representing a weak signal setting (low signal-to-noise ratio (SNR)), or a higher range from to , representing a strong signal setting (high SNR).
Remark 4.1**.**
We highlight that the inner function is quadratic in the parents of . In the experiments, we used polynomial basis expansions of order two to linearly model the inner function by specifying not only parents but also their squares.
Using this process, we generated independent data sets and estimated the causal ordering. We compared our results with the classical RESIT method for additive noise models (Peters et al.,, 2014) and the AbPNL method for post-nonlinear models (Uemura et al.,, 2022) restricted to the respective causal order estimation parts. The causal ordering of a given DAG is not necessarily unique, thus, as a measure of performance for an estimated causal ordering we report the number of directed edges in the true graph with , that is
[TABLE]
This measure equals zero when is a valid causal ordering and achieves its maximum, the number of edges in , when is a reversed causal ordering.
Figure 1 shows the performance of the Gaussian method introduced in Subsection 2.1, named RankG, compared to RESIT and AbPNL on 4- and 7-dimensional causal graphs in settings with Gaussian, Gumbel (standard extreme value distribution) and Logistic noise. Dashed lines indicate the weak signal setting and solid lines depict the strong signal setting. We plot the mean of our performance measure, the number of wrongly oriented edges in the fully connected DAG corresponding to the estimated causal ordering, over all 100 data sets against the sample size.
Our proposed RankG method outperforms the competition in almost all considered settings, especially in the low SNR setting. We emphasize that it might seem counterintuitive that the RankG method performs better in a low SNR setting than in a high SNR setting, however, the noise drives the identification in the rank-based learning procedure in PNL models. Thus, higher noise in comparison to the signal strengths induces more changes in the ranks that propagate through the graph, and thus, better performance of the rank-based estimation methods.
Furthermore, in the low SNR setting our computational results support the theoretical consistency results and our method seems to recover a valid causal ordering even in moderate sample sizes.
The results in Figure 1 display that even under noise misspecification, that is, in the Gumbel and Logistic noise cases, the RankG method performs best. This might indicate some robustness of our proposed method for causal order estimation, even though we do not recover the true noise under misspecification (see Figure 6).
We conducted similar experiments to compare the performance of our proposed general method introduced in Subsection 2.2, named RankS, however with lower sample sizes for computational reasons. Figure 2 shows the results of the different competitors RankS, RankG, RESIT and AbPNL for 4-dimensional graphs with Gaussian, Gumbel and Logistic noise. Our proposed method RankS performs best in all considered sample sizes and all noise cases, except for the weak signal settings where RankG performs better. This might stem from the fact that the pairwise rank likelihood used in RankG better approximates the marginal rank likelihood.
In additional experiments, we investigated the behavior of the introduced methods for more complex functional relations , namely a polynomial of degree 4. Similar to the experiments before, we sampled data from Erdős–Rényi DAGs, however, with the structural relations
[TABLE]
for with different noise distributions and and sampled with low or high SNR. Figure 3 shows the resulting mean performance measures for the different methods RankS, RankG, RESIT and AbPNL over 100 data sets. As expected, the task becomes more challenging by increasing the complexity of the functional parameters, since it is difficult to estimate the functional relations in the first place. However, even in the considered low sample sizes, our proposed RankS method seems to detect some causal structure and outperform the competition.
Further, we analysed the behaviour of the used functional estimators introduced in Section 2 by performing the following experiments. We generated 100 independent data sets of sample size 500 according to model (2) with a 3-dimensional predictor , standard Gaussian or Gumbel noise, cubic function and linear parameters . Then we estimated the functional parameter and the function with the introduced rank-based methods.
Figure 4 shows the estimation of the function for one representative result across the 100 replications. The red lines indicate the true value of the function , while the black dots indicate the pointwise estimates. We notice that the estimation of the function with the rank-based method that relies on the normality assumption (used in RankG) fails to correctly estimate the functional relation at extremes under Gumbel noise. This is due to the misspecified tail probability structure. In contrast, the general estimation method employed in RankS is not influenced by the specific underlying noise distribution. However, in the Gaussian noise case we notice small estimation bias, which can be regulated with the hyperparameter in the estimation procedure. Further, from the results across all 100 data sets, we noticed that the variance of the functional estimate across the data sets is higher using the general estimation methods in RankS.
Figure 5 shows the estimation of the first two entries of . Recall that in the general RankS method we fixed the last entry in our estimation of . The box plots show the estimated values of across the 100 data sets and red dots indicate the true values. We see again that RankG estimates the parameter with a bias in the misspecified Gumbel noise setting, similar to the estimation of the function . However, in the Gaussian setting the RankG method estimation of has a lower variance than RankS and in all other cases the median estimate corresponds to the true value.
Figure 6 shows the estimated noise by combining both estimation results. We plot the estimated noise against the true values for one representative data set (similar to Figure 4). Red lines correspond to a perfect estimation. The RankG method inherits the behaviour from both estimation parts and fails to correctly estimate the extreme noise cases. However, it outperforms the RankS method in the Gaussian setting.
5 CONCLUSION
We proposed a new routine for causal discovery in multivariate post-nonlinear structural equation models. Our method disentangles the two tasks of estimating the functional relations and learning the causal structure by employing rank-based methods for the first task. Thus, our proposed routine is less susceptible to overfitting issues exhibited by the existing methods that rely on minimizing dependence and subsequently testing for independence.
We introduced PNL rank regression methods to learn the functional relations in PNL models and subsequently estimate the residuals. As a special case, we first considered Gaussian noise and used pairwise rank likelihoods in a computationally fast algorithm, whereas, for the general noise case, we employed a smoothed version of rank correlations to obtain estimates of the functional relations. While our presentation focused on linearity in the inner functional relation to simplify the theoretical analysis, the framework includes nonlinear relations by means of basis expansions. Employing the introduced estimators of the functional relations, we proposed a causal learning routine to recursively identify sink nodes based on independence tests with the estimated residuals. Further, we prove consistent causal order recovery of our proposed routine under identifiability assumptions and consistency of the employed functional estimators.
We validated our theoretical findings in a simulation study that showed that our proposed routine outperforms the competition and is able to recover a valid causal ordering even in moderate sample sizes.
Acknowledgements
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 83818), the German Federal Ministry of Education and Research, and the Bavarian State Ministry for Science and the Arts. The authors of this work take full responsibility for its content.
Appendix A ADDITIONAL ASSUMPTIONS
In this section, we list all additional assumptions that are mentioned in the main paper in detail.
Cuzick, (1988) employs the following 4 (additional) assumptions to ensure the consistency of defined in (5). Thus, the following assumptions together with the modelling assumptions and assumption (A) in the main paper ensure the consistency of the proposed RankG method.
- AG1
Let . Assume has distribution , then is uniformly integrable for some which is specified in the next assumption (at least ). 2. AG2
For defined in (4), there exists finite such that
[TABLE]
for all and , where is a neighborhood of and . 3. AG3
The function is continuous as . 4. AG4
The following inequality holds
[TABLE]
The main assumptions above essentially correspond to moment conditions on the distribution of .
Following Lin and Peng, (2013) (AS1-AS4) and Zhang, (2013) (AS5-AS9), we list the additional assumptions that are required to ensure consistency of the estimators in Section 2.2. Thus, in combination with the modelling assumptions and assumption (A) in the main paper, the RankS method is consistent.
- AS1
Let be the density function of and the distribution function of . Define the functions and . Then is nonsingular. 2. AS2
The density is positive with a continuous second derivative on its corresponding compact support. 3. AS3
The function has a continuous second derivative on its corresponding support. 4. AS4
The random variable is bounded with a compact support. 5. AS5
The true parameter is an interior point of a compact subset . 6. AS6
The support of is not contained in a linear subspace of . Moreover, conditional on the first components of , the last component of has a density function with respect to the Lebesgue measure. 7. AS7
Define
[TABLE]
and let
[TABLE]
There exists a neighborhood of such that for each pair of in the support of the following hold
- •
The second derivatives of with respect to exist in .
- •
There exists an integrable function such that for all in
[TABLE]
- •
.
- •
.
- •
The matrix is strictly negative definite. 8. AS8
There exists and in the support of such that is contained in a compact interval. 9. AS9
For , and we define
[TABLE]
Further, we define
[TABLE]
then
[TABLE]
is negative for each and uniformly bounded away from 0.
Zhang, (2013) show uniform consistency on the interval defined in the assumptions AS8-AS9.
Appendix B PROOF OF PROPOSITION 2.1
To prove Proposition 2.1 we employ the following two Lemmas.
Lemma B.1**.**
For all we have
[TABLE]
where and denote the CDF and PDF of the standard normal distribution.
Proof.
With , we show that . Substituting the derivative of we have
[TABLE]
Since for all it remains to show that . For this is clear. Thus we consider the case . We have for the derivative of
[TABLE]
Therefore, is a strictly decreasing function. The limit of for is given by
[TABLE]
and the claim follows. ∎
Lemma B.2**.**
The function
[TABLE]
is concave, where is the CDF of the standard normal distribution and is a nonzero constant. Moreover, for a vector and Hessian matrix if and only if .
Proof.
The function is twice differentiable, and thus for the first part of the Lemma it is enough to show that the Hessian of is negative semi-definite. The gradient of is given by
[TABLE]
where is the PDF of the standard normal distribution. Thus, the Hessian is
[TABLE]
So, for any we have
[TABLE]
where the last step follows from Lemma B.1 and the fact that .
Moreover, if and only if , which completes the proof. ∎
Employing the Lemmas we can prove Proposition 2.1.
Proof.
From (2.1) we have
[TABLE]
Thus, with Lemma B.2 we know that is a sum of concave functions. Since sums preserve the concavity is concave.
We show strict concavity by contradiction and thus assume that is not strictly concave. This implies that there exists a vector such that for the Hessian matrix of . The Hessian operator is linear, thus, is a sum of Hessians, that is
[TABLE]
Lemma B.2 gives that if and only if . Since for each , either one of or is 1, implies that for all and . Therefore, for a constant for all . Let be the sample vector of the th component of and define the matrix . Then does not have full column rank, i.e. taking implies .
However, if we take any arbitrary square sub-matrix in and compute the determinant, we obtain a non-zero polynomial of some . The Lemma in (Okamoto,, 1973) states that such a polynomial is zero only on the Lebesgue measure zero. Therefore, the rank of is , which contradicts the equality and, thus, completes the proof. ∎
Appendix C PROOF OF THEOREM 2.1
To keep the formulas readable, we define . This gives
[TABLE]
For the proof, we use the Taylor expansion of around and use properties of the gradient of at , which are established in the following Lemmas.
Lemma C.1**.**
The gradient converges to zero almost surely, that is
[TABLE]
Proof.
The gradient
[TABLE]
is a U-statistic with kernel
[TABLE]
Note that is symmetric as there are no ties in ’s ( has a continuous distribution).
We use the generalization of the classical Strong Law of Large Numbers (SLLN) for U-statistics (e.g. Theorem A in (Serfling,, 1980) Section 5.4). The expectation of the kernel is
[TABLE]
where the first equalities follow from the linearity of the expectation and the tower rule of the expectation, i.e. for any function . The third equality is a classical porperty of the indicator function. The fourth equality uses the monotonicity of the function . The fifth step is just a cancellation of the equal members in the fractions. Finally, the sixth equality follows from the fact that for the probability density function of the standard normal distribution.
We show that the absolute value of the kernel has a finite expectation, since
[TABLE]
where we used the triangle inequality for the absolute value in the first step and the other steps are similar to the calculation for the expectation of the kernel . The last quantity is finite since the probability density function of the standard normal distribution is bounded.
Thus, the SLLN yields
[TABLE]
which completes the proof of the Lemma. ∎
Lemma C.2**.**
Let . For all , such that , and we have
[TABLE]
Proof.
The taylor expansion of around gives
[TABLE]
where , i.e., is in the closed ball around with radius . Clearly, is continuous with respect to . Moreover, from Proposition 2.1 we know that the maximum eigenvalue of is negative. Thus, there exist a maximum eigenvalue of within the closed ball with center and radius .
Therefore, we have
[TABLE]
Using the previous Lemma C.1 the remaining term asymptotically vanishes in probability and thus the claim follows. ∎
In the following we prove Theorem 2.1., i.e. .
Proof.
From Lemma C.2 we know that for any fixed , there exists a maximum of within the closed ball around with radius with probability tends to 1. However, maximizes , and thus
[TABLE]
which completes the proof. ∎
Appendix D PROOF OF THEOREM 3.1
To prove the Theorem we employ the following Lemma.
Lemma D.1**.**
Under the assumption () we have
- •
If node is not a sink node, then
[TABLE]
- •
If node is a sink node, then
[TABLE]
Proof.
First, assume that is not a sink node. Denoting , assumption () gives that . On the other hand, Theorem 3 in Gretton et al., (2005) gives that for all , with probability at least we have
[TABLE]
where and are constants. Thus,
[TABLE]
Second, assume that is a sink node. Then is an i.i.d. sample from the PNL regression model (1). Using the (point-wise) consistency of the estimators, we have
[TABLE]
Thus, we obtain
[TABLE]
From the definition of and the decomposition of the HSIC, we have
[TABLE]
where
[TABLE]
and .
Similar to the proof of Theorem 2 in Teran Hidalgo et al., (2018) we will show that . From Lemma 1 in Gretton et al., (2005) we have
[TABLE]
where and .
We show that . We have
[TABLE]
where we used in the second equality. For , the Markov inequality gives
[TABLE]
Here, we used that and are bounded by 1, thus, their variances are bounded. The number of terms in the quadruple sum that have exactly three different indices are of order , but the denominator is of order , which leads to the second term. The last comes from the fact that the number of terms in the quadruple sum that have exactly four different indices are of order .
Using (7) and the continuous mapping theorem we obtain
[TABLE]
and since is bounded, it is uniformly integrable and we obtain
[TABLE]
In a similar way, we obtain
[TABLE]
Using the fact that is independent from and employing the two equalities above , we have
[TABLE]
Thus, all terms in (9) are . Moreover, from the above arguments we have and as is bounded and the denominator is of order . So, in (8) all the terms are , which gives
[TABLE]
In the same fashion, one can show that and , which put together proves that . Moreover, , since is a sink node and thus the noise is independent from the remaining nodes . Thus,
[TABLE]
which completes the proof of the Lemma. ∎
Now we can prove Theorem 3.1.
For any set we denote the sub-graph of over the nodes as . Then
[TABLE]
The proof of Theorem 3.1 is completed if we show that
[TABLE]
since this implies
[TABLE]
Using assumption () and the recursive construction of it is enough to prove (10) for , that is
[TABLE]
By Lemma D.1 we know that goes to zero in probability for sink nodes and is a least for other nodes. Thus,
[TABLE]
This completes the proof of the Theorem.
Appendix E NUMERICAL RESULTS
In this Section we provide the concrete simulation results in tabular form and additionally provide the empirical standard deviations if the results.
Tables 1, 2, 3 and 4, 5, 6 show the results of RankG, AbPNL and RESIT methods for 4-dimensional graphs with strong and weak signal settings, for Gaussian, Gumbel and Logistic noises, respectively. Tables 7, 8, 9 and 10, 11, 12 show the results for 7-dimensional graphs. In both cases the sample sizes are 100, 500, 1000, 1500, and 2000, respectively.
Moreover, Tables 13, 14, 15 and 16, 17, 18 show the results of RankS, RankG, AbPNL and RESIT methods for 4-dimensional graphs with strong and weak signal settings, respectively. Tables 19, 20, 21 and 22, 23, 24 show the results of RankS, RankG, AbPNL and RESIT methods for 4-dimensional graphs where the function is polynomial of degree 4 with strong and weak signal settings, respectively. In both cases the sample sizes are 100, 150, 200, 250, and 300.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Abrevaya, J. (1999 a). Computation of the maximum rank correlation estimator. Economics Letters , 62(3):279–285.
- 2(2) Abrevaya, J. (1999 b). Leapfrog estimation of a fixed-effects model with unknown transformation of the dependent variable. Journal of Econometrics , 93(2):203–228.
- 3Abrevaya, (2003) Abrevaya, J. (2003). Pairwise-difference rank estimation of the transformation model. Journal of Business & Economic Statistics , 21(3):437–447.
- 4Bühlmann et al., (2014) Bühlmann, P., Peters, J., and Ernest, J. (2014). CAM: Causal additive models, high-dimensional order search and penalized regression. The Annals of Statistics , 42(6):2526 – 2556.
- 5Cavanagh and Sherman, (1998) Cavanagh, C. L. and Sherman, R. P. (1998). Rank estimators for monotonic index models. Journal of Econometrics , 84:351–381.
- 6Chen, (2002) Chen, S. (2002). Rank estimation of transformation models. Econometrica , 70(4):1683–1697.
- 7Cuzick, (1988) Cuzick, J. (1988). Rank regression. The Annals of Statistics , 16(4):1369–1389.
- 8Doksum, (1987) Doksum, K. A. (1987). An Extension of Partial Likelihood Methods for Proportional Hazard Models to General Transformation Models. The Annals of Statistics , 15(1):325 – 345.
