Semiparametric Outcome Regression-Based Estimator of Mann-Whitney-type Causal Effect

Safiya S. Sani; Bryan S. Blette; Chun Li; Abubakar Yahaya; Hussaini G. Dikko; Abubakar Usman; Usman J. Wudil; Faisal Dankishiya; Nafi’u Hussaini; C. William Wester; Muktar H. Aliyu; Bryan E. Shepherd

PMC · DOI:10.21203/rs.3.rs-8340013/v1·February 2, 2026

Semiparametric Outcome Regression-Based Estimator of Mann-Whitney-type Causal Effect

Safiya S. Sani, Bryan S. Blette, Chun Li, Abubakar Yahaya, Hussaini G. Dikko, Abubakar Usman, Usman J. Wudil, Faisal Dankishiya, Nafi’u Hussaini, C. William Wester, Muktar H. Aliyu, Bryan E. Shepherd

PDF

Open Access

TL;DR

This paper introduces a new statistical method for estimating causal effects in observational studies, which is more accurate and robust compared to traditional methods.

Contribution

The novel contribution is a semiparametric estimator for Mann-Whitney-type causal effects using cumulative probability models.

Findings

01

The CPM estimator shows reduced variability and improved predictive accuracy in simulations.

02

The method is applied to assess the causal effect of HIV status on albuminuria levels in a Nigerian cohort.

03

Robust semiparametric methods are valuable for causal inference beyond average treatment effects.

Abstract

We introduce a novel semiparametric estimator for Mann-Whitney-type causal effects based on the cumulative probability model (CPM). CPMs are rank-based, invariant to monotone transformations of the outcome, and offer flexible outcome regression under confounding. We formalize the estimation under causal consistency, no interference, ignorability, and positivity, and develop accompanying inference procedures. Through simulations with varying sample sizes and effect magnitudes, the CPM estimator shows reduced variability and improved predictive accuracy relative to mis-specified parametric transformations. We demonstrate its applicability in a large cohort of People with HIV (PWH) in Northern Nigeria by assessing the causal effect of HIV status on albuminuria levels. Overall, our results highlight the value of robust semiparametric methods for causal inference in observational settings…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species2

Human immunodeficiency virus 1(no rank)Homo sapiens(human · species)

Diseases1

albuminuria

Keywords

AlbuminuriaCausal Effect EstimationCumulative Probability ModelEstimated Glomerular Filtration RateMann-Whitney-Type ParameterSemiparametric ModelsUrine Albumin to Creatinine Ratio

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Statistical Methods and Bayesian Inference · Genetic Associations and Epidemiology

Full text

Introduction

1

The Neyman-Rubin Causal Model (Rubin, 1974; Imbens and Rubin, 2015), also known as the potential outcomes framework, is a fundamental approach to define, identify, and estimate causal effects. For each subject $[eqn]$ in a sample with a binary treatment, there are two potential outcomes, $[eqn]$ being the outcome if the subject is treated and $[eqn]$ the outcome if the subject is not treated. These are called potential outcomes because they represent what could potentially happen under each treatment condition. When treatment is applied to a subject, we observe $[eqn]$ becomes the counterfactual outcome, representing what would have happened if the subject had not been treated. Conversely, if the subject is not treated, we observe $[eqn]$ and $[eqn]$ becomes counterfactual. The individual causal effect (ICE) can be defined as $[eqn]$ , but each subject can only belong to one group at a time. Thus, identification of ICEs is not possible without very strong assumptions since we can never observe both $[eqn]$ simultaneously.

Since ICEs cannot be directly observed, researchers often focus on estimating population-level causal effects. These are calculated by summarizing the individual-level causal effects across the entire population or a subset of the population. Some examples from Grilli and Rampichini (2010) include the Average Causal Effect (ACE): the expected difference in outcomes between treated and untreated subjects in the population, given by $[eqn]$ and the difference between the medians of the potential outcomes under the two treatment conditions, i.e., $[eqn]$ .

These estimands are termed “causal” because they were derived from comparisons between potential outcomes corresponding to different causal scenarios (i.e., outcomes under treatment versus no treatment). Such estimands enable quantifying the average or typical effect of a treatment within a population, even though individual causal effects cannot be directly observed. This framework underpins a variety of methods in causal inference, including those applied in both randomized experiments and observational studies.

While causal inference methods traditionally focus on estimating ACEs, there is growing interest in exploring alternative effect measures. The Mann-Whitney U test, also known as the Wilcoxon rank sum test, is a nonparametric statistical test that compares two independent samples. The Mann-Whitney U test is applicable to any outcome variable with a natural ordering, but it is particularly useful for comparing data outcomes that are skewed or ordinal. Beyond significance testing, the Mann-Whitney U test yields an interpretable parameter, the probabilistic index, which is the probability that a randomly selected subject in one group has a higher outcome value than a randomly selected counterpart in the other group. A probabilistic index greater than 0.5 suggests that treated units tend to have higher outcomes than controls. If the value is close to 0.5, there is little evidence of existence of any difference between the groups (i.e., similar outcomes between treated and control units). If it is significantly below 0.5, it indicates that controls tend to have higher outcomes.

The Mann-Whitney U test does not directly test medians but compares entire distributions and is valid even with tied data, contrary to common misconceptions (Divine et al. (2018)). Thas et al. (2012) introduced a semiparametric model of the probabilistic index, the probabilistic index model (PIM), which models the probability that the outcome $[eqn]$ given one set of covariates is less than an independent outcome $[eqn]$ given another set of covariates. Vermeulen et al. (2015) discussed enhancing the power of Mann-Whitney U test in randomized experiments, particularly for skewed outcomes or for small sample sizes, and proposed a method to adjust for auxiliary baseline covariate information, which are often collected but not used in the traditional Mann-Whitney U test. While these advances promoted the use of auxiliary covariates, they did not formally describe a relevant causal inference framework that could be used in observational study settings.

In contrast, Fay et al. (2018) and Zhang et al. (2019) proposed a Mann-Whitney-type causal effect estimand denoted by $[eqn]$ , and measured as $[eqn]$ , with subscripts $[eqn]$ and $[eqn]$ representing two independent subjects. Thus, $[eqn]$ captures the tendency of a randomly selected unit to have a higher potential outcome under treatment than a different randomly selected unit’s potential outcome under control. Zhang et al. (2019) proposed several methods for estimating the Mann-Whitney-type causal effect, one of which is an estimator based on an outcome regression (OR) model, which models the conditional distribution of the outcome (perhaps after a Box-Cox transformation) given treatment and confounders. The OR estimator of the Mann-Whitney-type causal effect proposed by Zhang et al. (2019) requires correct model specification for unbiased estimation and was presented using parametric models. For example, with skewed data, their approach requires choosing an appropriate transformation such that the transformed response data are normally distributed and then fitting a normal linear model to the transformed data. However, selecting a transformation and then fitting a parametric model is contrary to the spirit of the Mann-Whitney test, which is invariant to any monotonic transformation of the data and requires no distributional assumptions.

In this paper we explore the use of a semiparametric model that is invariant to outcome transformation, the cumulative probability model (CPM), for obtaining an OR estimator of the Mann-Whitney-type causal parameter, providing a robust and flexible approach to causal effect estimation. CPMs are often fit to ordinal outcomes (e.g., the proportional odds model is a CPM), but Liu et al. (2017) demonstrated that CPMs could also be used to analyze response variables that are continuous, or mixtures of ordinal and continuous (e.g., data with detection limits). Fitting a CPM to continuous data is equivalent to fitting a semiparametric linear transformation model, where data are assumed to follow a linear model after some unspecified monotonic transformation that is nonparametrically estimated. Liu et al. (2017) highlighted the robustness and flexibility of CPMs, which motivates our use of CPMs in causal effect estimation. Li et al. (2023) established asymptotic properties of CPMs for continuous outcomes. Here we explore a novel approach to estimating the Mann-Whitney-type causal effect measure using the CPM. We develop an innovative estimator that leverages CPM-based regression models for causal effect estimation.

The remainder of the manuscript is organized as follows. Section 2 describes the CPM and the proposed estimator of the Mann-Whitney-type causal effect for observational studies. In section 3, we present simulation experiments to evaluate the performance of this estimator and to compare it with other estimators. Section 4 presents an application of the method to an observational study, estimating the causal effect of HIV on kidney function among adult People Living with HIV (PWH) seen at the Aminu Kano Teaching Hospital (AKTH) in Northern Nigeria. The paper concludes with section 5 wherein additional discussion of the CPM estimator of the Mann-Whitney-type effect measure is provided.

Methodology

2

Mann-Whitney-type Causal Effect Estimand

2.1

We examine a scenario with two groups: the exposed, denoted as A = 1, and the unexposed, denoted as A = 0, with an orderable (i.e., continuous, ordinal, or a mixture of the two) outcome represented by Y. We define $[eqn]$ as the potential outcome for individual $[eqn]$ if assigned to the treatment group (A = 1) and $[eqn]$ as the potential outcome for individual $[eqn]$ if assigned to the control group (A = 0). Additionally, let X be a vector of covariates that are unaffected by treatment. Our focus is on estimating a population level Mann-Whitney-type causal effect.

Let $[eqn]$ and $[eqn]$ be the marginal distributions of $[eqn]$ and $[eqn]$ , respectively. The Mann-Whitney-type effect estimand is represented in terms of the marginal distributions of potential outcomes as:

[eqn]

where

[eqn]

and $[eqn]$ is an indicator function. For a continuous outcome, $[eqn]$ and the corresponding $[eqn]$ , namely, $[eqn]$ captures the tendency of a randomly selected unit, subject $[eqn]$ , to have a higher potential outcome under treatment than the potential outcome under control of an independent randomly selected unit, subject $[eqn]$ . Note that $[eqn]$ depends only on marginal distributions of the potential outcomes because subscripts $[eqn]$ and $[eqn]$ represent two independent subjects.

Since $[eqn]$ is defined using the marginal distributions of the potential outcomes, to identify this estimand, we first identify $[eqn]$ with observable data. To do this, we make the following four common causal assumptions:

No interference: The treatment received by one individual does not affect the outcomes of another individual.Consistency: The potential outcome for an individual under a specific treatment condition matches the observed outcome when that individual receives that treatment:

[eqn]

Ignorability: Treatment assignment, A, is independent of the potential outcomes of Y conditional on the covariates vector X (Rosenbaum & Rubin, 1983). This is an assumption of no unmeasured confounding. This can be expressed as:

[eqn]

Positivity: Given $[eqn]$ , there is a non-zero probability of being assigned the treatment or control:

[eqn]

Under these assumptions, the marginal distribution of $[eqn]$ , $[eqn]$ or 1 is

[eqn]

The second equality is from the law of total probability (conditioning on $[eqn]$ ), the third line is from Ignorability, and the fourth is from Consistency. Thus, the marginal distribution of $[eqn]$ is now written in terms of observables (not counterfactual variables), so it is identified and can be estimated.

The distribution of $[eqn]$ can be estimated without modeling assumptions using the empirical cumulative distribution function of $[eqn]$ . However, with continuous or multi-dimensional $[eqn]$ , modeling assumptions will typically be needed to estimate $[eqn]$ . Ideally, to maintain the spirit of the rank-based nature of $[eqn]$ , the model for $[eqn]$ will also be based on ranks. In the next section, we describe such a model.

Semiparametric Cumulative Probability Model

2.2

We assume that the outcome Y follows a semiparametric linear transformation model, that is, $[eqn]$ . $[eqn]$ is a transformation function that is unknown but assumed to be monotonic increasing, $[eqn]$ is a vector of coefficients, $[eqn]$ is the coefficient associated with the treatment variable and $[eqn]$ is a random variable that is assumed to follow a cumulative distribution function $[eqn]$ and is independent of $[eqn]$ and $[eqn]$ . For ease of presentation, we have chosen to write this model with simple linear relationships, but we should note that the model could easily be made more flexible by including, for example, pre-specified functions of $[eqn]$ (e.g., polynomials, splines, products) and interactions between functions of $[eqn]$ and $[eqn]$ . It follows that $[eqn]$ , where $[eqn]$ is a transformation of $[eqn]$ such that the transformed value is linearly related to $[eqn]$ and $[eqn]$ . We specify the cumulative probability distribution of $[eqn]$ conditional on $[eqn]$ and A as:

[eqn]

By letting $[eqn]$ and $[eqn]$ , we can write the linear transformation model as a CPM:

[eqn]

where $[eqn]$ is sometimes referred to as an intercept function and $[eqn]$ is a link function. We fit our CPM in a semiparametric manner that uses a step function to estimate $[eqn]$ . Without loss of generality, for $[eqn]$ ordered unique observed values of $[eqn]$ denoted by $[eqn]$ , we estimate $[eqn]$ associated intercepts, $[eqn]$ , where $[eqn]$ . We express the CPM as:

[eqn]

The nonparametric likelihood function of the CPM for independent and identically distributed realizations of $[eqn]$ is given by:

[eqn]

In this equation, $[eqn]$ is an auxiliary parameter with the constraint $[eqn]$ . The likelihood function reaches its maximum when $[eqn]$ and $[eqn]$ allowing us to simplify it to:

[eqn]

This formulation is analogous to the multinomial likelihood for cumulative link models when outcomes are considered as ordered categorical. Maximizing the function results in the nonparametric maximum likelihood estimators (NPMLEs) for $[eqn]$ , and $[eqn]$ . The NPMLE of the transformation function $[eqn]$ forms an increasing step function with a step corresponding to each of the $[eqn]$ intervals between adjacent outcome values from $[eqn]$ to $[eqn]$ . The model applies to every distinct outcome value, with each value associated with a corresponding $[eqn]$ . Because the transformation is nonparametrically estimated, the estimated parameters $[eqn]$ and $[eqn]$ are invariant to any monotonic transformation of $[eqn]$ . Thus, the CPM is rank-based. With a single binary predictor $[eqn]$ , the score test for the corresponding $[eqn]$ in the CPM is nearly identical to the Mann-Whitney/Wilcoxon rank sum test (McCullagh 1980, Tian et al., 2024). With a bounded range of $[eqn]$ , the maximum likelihood estimators of $[eqn]$ , and $[eqn]$ are consistent and asymptotically normally distributed (Li et al., 2023). Estimation can be done using the ˋormˋ function in the ˋrmsˋ package (Harrell, 2025) in R software (R Core Team, 2025).

Estimating Mann-Whitney-like causal effects using CPMs

2.3

Estimating the distributions of potential outcomes, $[eqn]$ , follows the maximum likelihood estimation of the model parameters. We do that by replacing the parameters in the RHS of (6) by their respective estimators. Then $[eqn]$ can be estimated using a plug-in estimator for (3) as:

[eqn]

for $[eqn]$ or 1, $[eqn]$ is the total sample size, and $[eqn]$ is a step function with jumps at the observed $[eqn]$ , specifically $[eqn]$ , where $[eqn]$ such that $[eqn]$ .

To estimate our Mann-Whitney causal effect parameter, $[eqn]$ , we use plug-in estimators for (1) based on the estimated marginal CDFs of the potential outcomes. Specifically,

[eqn]

where $[eqn]$ and $[eqn]$ . An alternative formulation would exclude $[eqn]$ from the double summation to avoid ties; with large n, this alternative is equivalent to equation (10), which we use throughout this manuscript. Equation (10) leverages the flexibility and robustness of CPMs to estimate the Mann-Whitney-type causal effect. The CPM-based regression estimator is particularly advantageous because it does not require the specification of a parametric model for the outcome distribution, making it invariant to outcome transformation and more robust to model misspecification. Under the same assumptions as Li et al. (2023), our CPM-based outcome regression estimator of $[eqn]$ is consistent and asymptotically normal. Confidence intervals can be constructed using the bootstrap.

Simulations

3

Simulation Procedure

3.1

We performed a simulation study to evaluate the performance of the proposed method. We estimated performance metrics based on 1000 simulated datasets for each of a variety of scenarios. Each dataset had a sample size of n = 50, 200 or 500. A single covariate, $[eqn]$ , normally distributed and having zero mean and unit variance was generated. A binary treatment and a skewed continuous outcome were then generated. The binary treatment variable, $[eqn]$ , was generated according to a logistic model of the form:

[eqn]

where $[eqn]$ . The outcome, $[eqn]$ , was generated from a normal linear model exponentially transformed as:

[eqn]

where $[eqn]$ and $[eqn]$ ; varying $[eqn]$ and $[eqn]$ control the amount of confounding and the strength of the treatment effect. Note that this generates data following the linear transformation model outlined in Section 2.1, with $[eqn]$ . Different values of $[eqn]$ and $[eqn]$ lead to different values for the causal Mann-Whitney parameter, $[eqn]$ . When $[eqn]$ , indicating that the treatment has no causal effect. For other values of $[eqn]$ , true $[eqn]$ values were determined empirically as outlined in the Web Appendix.

The causal Mann-Whitney parameter was estimated using the statistical methods described in Section 2. CPMs were fit using a properly specified probit link function (inverse of a standard Gaussian distribution, $[eqn]$ ) without specifying the transformation $[eqn]$ (i.e., it was empirically estimated) using the ‘orm(.)’ function in the ˋrmsˋ package (Harrell, 2025) in R software (R Core Team, 2025).

Ninety-five percent confidence intervals for our estimates of $[eqn]$ were computed as the estimate ±1.96 times the standard deviation of 200 replicate bootstrap estimates. We considered the robustness of our approach by performing estimation fitting CPMs that incorrectly specified the link function (using logit link instead of probit link) and that failed to include the confounder variable, X (i.e., $[eqn]$ incorrectly constrained to zero). We investigated the efficiency of our approach by comparing it to a properly specified parametric model (e.g., OR estimator of Zhang et al. (2019) after properly log-transforming the data and fitting a normal linear model). Because the proper transformation is unknown in practice, we also fit an incorrectly specified parametric model that applied the OR estimator of Zhang et al. (2019) by fitting a normal linear model after incorrectly square-root-transforming the outcome. For the specifications of these comparison models, see Web Appendix II.

For each simulation scenario, the standard deviation (SD), root mean squared error (RMSE), and bias of the estimates were computed. Coverage probabilities (CP) were also computed. The relative efficiency of the CPM estimator to both an estimator from a correctly and an incorrectly transformed parametric model was assessed using SD and RMSE ratios of the two estimators. All performance metrics were calculated with respect to the true value of $[eqn]$ . All analyses were conducted using R Software Version 4.4.1 (R Core Team, 2025) and interface RStudio Version 4.2.7 (RStudio Team, 2024) on a Windows 11 Pro platform (Microsoft Corporation, 2021).

Simulation Results

3.2

Simulation results are presented in the tables that follow. The $[eqn]$ column represents the confounder coefficient in the model, $[eqn]$ is the corresponding treatment coefficient, $[eqn]$ is the theoretical value (true parameter) of the Mann-Whitney causal effect being estimated. Bias is calculated as the difference between the average of our estimated values and $[eqn]$ . SD indicates the standard deviation of the estimates of the parameter across the 1000 simulated datasets. The RMSE assesses the average magnitude of the errors between the estimated values and the true values. Lower RMSE values suggest better model performance. The coverage probability (CP) indicates the proportion of times that the estimated 95% confidence intervals contained the true parameter value $[eqn]$ . A CP value close to 0.95 signifies good performance.

Table 1 shows the performance of our method with properly specified CPMs across multiple sample sizes, levels of confounding, and effect sizes. In general, our procedure performed well across a variety of values of $[eqn]$ , even with sample sizes as small as 50. Bias was low at all sample sizes but tended to be lower at larger sample sizes. Coverage probability was close to the nominal 0.95 level, particularly at the larger sample sizes.

Table 2 presents the performance of our method using the improperly specified logit link function, rather than probit, for the cumulative probability models. Bias remained low and coverage probability was still close to the 0.95 mark, at all sample sizes and true values of $[eqn]$ . Overall, these results demonstrate the estimator’s robustness to misspecification of the link function.

Table 3 shows the performance of our method when employing improperly specified linear predictors, neglecting to include the confounder variable, for the cumulative probability models. As expected, bias was high and coverage low except in the scenario where the confounder effects were absent (i.e., $[eqn]$ ). These findings underscore the necessity of properly accounting for confounders to maintain estimator validity.

Table 4 compares the bias of estimators obtained from properly specified CPMs and those derived from correctly transformed parametric models, while also presenting their relative efficiencies. Bias was similarly low for the proposed estimator and that based on the correctly transformed outcome regression model. The SDs of the CPM-based estimates tended to be slightly larger than those based on the correctly transformed outcome regression models, but the SD ratios were less than 1 in a few scenarios. Same applies to the RMSE ratios. This suggests a minor loss in efficiency when employing the CPM approach compared to the correctly specified parametric model. However, the robustness of the CPM approach is noteworthy, as it does not require any data transformation. Table 5 compares the biases of estimators obtained from properly specified CPMs and those derived from incorrectly transformed parametric models, while also presenting their relative efficiencies. In these simulations, the CPM estimator has substantially lower bias than those based on the incorrectly transformed parametric model; it also has RMSE ratios well below 1 in most settings.

Application

4

In this section, we present an application of the method to an observational study of kidney disease (Etiology of P**ersistent Micro**albuminuria in Nigeria (P-MICRO) Study; DK127912), estimating the causal effect of HIV on urine albumin-to-creatinine ratio (uACR) and estimated glomerular filtration rate (eGFR) among study participants seen at AKTH in Northern Nigeria.

The P-MICRO study was initiated following an evaluation of the Renal Risk Reduction (R3) Trial (DK112271) findings in Nigeria. The R3 Trial found that persistent albuminuria was common among ART-treated PWH. Microalbuminuria is an independent risk factor for cardiovascular and kidney disease and a predictor of end-organ damage, both in the general population and among persons living with HIV (PWH) (Matsushita et al., 2010; Fox et al., 2012; Khosla, Sarafidis & Bakris, 2006; Mann, Yi & Gerstein, 2004). Defined as an albumin-to-creatinine ratio (uACR) of 30–300 mg/g, microalbuminuria can signify early glomerular damage or microvascular endothelial dysfunction and is used in the early detection of kidney disease (Baweja et al., 2011; Rodriguez et al., 1988). HIV is hypothesized to increase the risk of albuminuria, likely due to HIV-associated nephropathy and related comorbidities, as well as the use of antiretroviral therapy (ART), which—while universally prescribed to control the virus—may contribute to increased albuminuria as a side effect. Among PWH, albuminuria has been associated with increased systemic T-cell activation and more rapid progression to AIDS (Gerstein, Mann, Yi, et al., 2001; Choi et al., 2010; Scherzer et al.; Reins et al., 2014). Microalbuminuria is also an important risk factor for mortality in ART-treated PWH, likely as a marker of inflammation and endothelial activation (Keane & Eknoyan, 1999; Wyatt et al., 2010). A cohort of adult ART-treated PWH and a cohort of age- and sex-matched HIV-negative adults were recruited from AKTH (Wester et al., 2022). Our analysis uses cross-sectional (baseline) P-MICRO (DK127912) data from 2998 study participants, of whom 2,248 (75%) are PWH.

Our exposure variable is HIV status (negative/positive), and the outcome variables are uACR and eGFR, both of which are commonly used to detect albuminuria (uACR) and assess kidney function (eGFR). Typically, uACR values are right-skewed (Tsuchihashi et al., 2002, and Kim et al., 2023), making a Mann-Whitney-type effect estimate an appropriate choice (see Figure 1 in Web Appendix III). In contrast, eGFR values typically exhibit a left-skewed (negatively skewed) distribution in the general population (Takahashi et al., 2016, and Chu et al., 2021; see Figure 2 in Web Appendix III). In practice, uACR is often dichotomized as normoalbuminuria (<30 mg/g), microalbuminuria (30–300 mg/g), and macroalbuminuria (>300 mg/g). Although categorizing a continuous variable is generally discouraged, we demonstrate the flexibility of our method by also applying it to this ordered categorical variable.

We consider the following baseline covariates assumed to be sufficient to satisfy the conditional exchangeability assumption: age (in years), body mass index (BMI), ethnicity (Fulani, Hausa, Igbo, Yoruba, Other), sex (male/female), alcohol use (yes/no), smoking status (yes/no), hepatitis B infection (yes/no), Apolipoprotein-1 (APOL1) genetic risk allele/variant status (with these variants being found on chromosome 22) (categorized as high risk [HR] genotype for 2 copies of APOL1 risk variants versus low-risk [LR] genotype for 0 or 1 copy of APOL1 risk variants), hypertension (categorized according to the eighth edition of the Joint National Committee guidelines into four groups: normal, prehypertension, stage 1 hypertension, and stage 2 hypertension), diabetes mellitus (yes/no), concomitant medication use (other than HIV antiretroviral therapy) (yes/no), and other comorbid medical conditions (yes/no).

Tables 6a and 6b compare the performance of Mann-Whitney-type effect estimators using different modelling approaches (CPMs with various link functions and parametric models with different transformations) to evaluate the effect of HIV on uACR and eGFR respectively, adjusted for potential confounders. At a 95% confidence level, 50 bootstrap replications were used to compute the confidence intervals.

With uACR as a continuous outcome variable, CPM-based estimators provided very similar effect estimates across different link functions, with logit (CPM 1) and probit (CPM 2) models yielding nearly identical results. The log-log link function resulted in the best model fit (e.g., highest likelihood; see Supplementary Material). In contrast, parametric models showed variability, highlighting their sensitivity to transformation choice. Among them, NLM 2 produced estimates closest to CPMs, suggesting a log transformation may have been most appropriate. Overall, CPMs offer a robust and stable analysis approach, while parametric models require careful transformation selection to ensure reliable interpretations.

When uACR was categorized, the Mann-Whitney-type effect estimates remained very similar across link functions, closely resembling their continuous counterparts. This stability validates the reliability of CPM-based estimators in estimating causal effects, regardless of whether the outcome variable is treated as continuous or categorical. This finding supports the idea that CPMs provide a flexible framework for evaluating the effect of HIV on kidney function markers, without requiring parametric assumptions.

Our Mann-Whitney causal effect estimators based on the CPM suggest that HIV lowers eGFR (lower eGFR implies reduced kidney function.) This conclusion held, regardless of the link function used for the CPM. The estimated Mann-Whitney causal effect was slightly higher using the log-log link function. It should be noted that the logit link function resulted in the best model fit (e.g., highest likelihood; see Supplementary material). Interestingly, parametric models produced similar Mann-Whitney causal effect estimates to CPM-based estimators in this case. This suggests that the relationship between HIV and eGFR may be less dependent on transformation choices compared to uACR. The agreement between approaches reinforces the validity of the observed negative effect of HIV on kidney function.

Discussion

5

In this study, we proposed a semiparametric outcome regression-based estimator for the Mann-Whitney-type causal effect based on the CPM. We evaluated our estimator’s performance through simulations and with an application to an observational study of kidney disease. Our results demonstrate the robustness and reliability of the CPM-based estimator in different scenarios, highlighting its utility for observational studies.

Zhang et al. (2019) made a valuable contribution by proposing Mann-Whitney treatment effect estimators, providing researchers with a structured approach for estimating these causal effects. Zhang et al. (2019)’s OR estimator, which requires specifying a transformation and then assuming normality, highlights a critical consideration in causal inference: the trade-off between modeling assumptions and robustness. Parametric models offer simplicity and interpretability but can be restrictive and prone to model misspecification, especially with skewed data. The necessity to transform data to meet parametric assumptions can be cumbersome and may not always lead to accurate causal estimates. In contrast, semiparametric approaches such as our OR estimator based on the CPM are consistent with the spirit of the Mann-Whitney test, which does not require specific distributional forms or transformations, thereby providing robustness. This robustness is desirable for practical applications where the true distribution of the data is generally unknown and often difficult to model accurately, particularly when dealing with skewed data.

The simulation results showed that the CPM estimator performed effectively across various sample sizes and combinations of confounder strength and treatment effects, exhibiting decreased bias and standard deviation with larger sample sizes, which led to smaller root mean square error values. Coverage probabilities were consistently close to the nominal 0.95 level, reinforcing the reliability of pairing the estimator with the bootstrap for inference. Notably, the CPM estimator maintained low bias and high accuracy, even with strong confounders and treatment effects, making it valuable in studies with significant effects. However, occasional convergence failures were observed in some simulations when either the confounder or the treatment effects were large with link-function misspecification. Specifically, failures occurred in 0.5% of simulation replications when $[eqn]$ and $[eqn]$ ; 2.4% when $[eqn]$ and $[eqn]$ ; 1.6% when $[eqn]$ and $[eqn]$ and 3% when $[eqn]$ and $[eqn]$ . Lack of convergence could serve as an indicator of link-function misspecification.

The CPM estimator demonstrated robustness to the misspecification of the link function; however, improper linear predictor specification resulted in poor performance. As with all causal models, care must be taken in selecting covariates and how to include them in regression models. CPMs offer a degree of flexibility over normal linear models because they do not require specifying a transformation of the outcome. However, like normal linear models, how the covariates are included in the models is important; CPMs have similar flexibility (e.g., splines, interaction, tree partitions) to normal linear models. We did not consider doubly robust estimators in this manuscript; in short, our outcome regression estimators based on CPMs could be made doubly robust following procedures presented by Zhang et al. (2019).

The application of the CPM estimator in the P-MICRO (DK127912) study offered valuable insights into the causal relationship between HIV status and albuminuria among ART-treated PWH in Northern Nigeria. The findings underscore the robustness of CPMs in estimating the effect of HIV on markers of kidney function. For uACR, CPMs produced stable and consistent results across modeling choices, avoiding the sensitivity to transformation assumptions observed with parametric models. The alignment of CPM-derived estimates in both continuous and categorized uACR outcomes further supports their reliability.

We found that uACRs were higher in ART-treated PWH compared to their HIV-negative counterparts. Relatedly, ART-treated PWH demonstrated lower eGFR, suggesting worse kidney function compared to HIV-negative individuals. The consistent association between HIV and increased uACR, and between HIV and reduced eGFR reinforces the importance of early renal function surveillance in HIV-positive populations. These findings provide evidence supporting the hypothesis that HIV increases the risk of albuminuria in PWH (National Kidney Foundation, 2025; Charles et al., 2018).

In conclusion, the semiparametric outcome regression-based estimator of the Mann-Whitney-type causal effect offers a robust and reliable framework for causal effect estimation in observational studies. Its performance across various scenarios and its successful application to the P-MICRO study underscores its potential for estimating causal effects in complex settings. This study contributes to the growing body of literature on causal inference methods by introducing a practical tool for researchers investigating causal relationships in observational data. Unlike parametric approaches, this tool does not require the correct transformation of the typically-unknown outcome transformation.

However, the proposed CPM-based Mann-Whitney-type estimator rests on standard causal identification assumptions—causal consistency, no interference, positivity, and ignorability. Although the method demonstrated robustness to certain misspecifications in simulations, convergence failures occurred in a minority of replications under strong confounding and link-function misspecification (e.g., up to 3% in some scenarios). Performance may be affected by unmeasured confounding, model misspecification of the covariate functional form, and limited sample sizes. Future research should address these issues and explore extensions to broader data-generating processes and outcomes. Additionally, future work could focus on improving computational efficiency and expanding applicability to other health outcomes and populations.

Supplementary Files

This is a list of supplementary files associated with this preprint. Click to download.

SupplementaryMaterialsSafiya.docx

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Baweja S., Kent A., Masterson R., Roberts S., & Mc Mahon L. P. (2011). Prediction of preeclampsia in early pregnancy by estimating the spot urinary albumin:creatinine ratio using high-performance liquid chromatography. BJOG: An International Journal of Obstetrics & Gynaecology, 118(9), 1126–1132. 10.1111/j.1471-0528.2011.02961.x 21481153 · doi ↗ · pubmed ↗
2Swanepoel Charles R., Atta Mohamed G., D’Agati Vivette D., Estrella Michelle M., Fogo Agnes B., Naicker Saraladevi, Post Frank A., Wearne Nicola, Winkler Cheryl A., Cheung Michael, Wheeler David C., Winkelmayer Wolfgang C. and Wyatt Christina M.; other Conference Participants. (2017). KDIGO kidney disease in HIV conference report. Retrieved from https://kdigo.org/wp-content/uploads/2017/02/KDIGO-Kidney-disease-in-HIV-conf-report-FINAL.pdf?form=MG 0AV 3&form=MG 0AV 3
3Choi A. I., Scherzer R., Bacchetti P., Tien PC, Saag MS, Gibert CL, Szczech LA, Grunfeld C, Shlipak MG (2010). Cystatin C, albuminuria, and 5-year all-cause mortality in HIV-infected persons. American Journal of Kidney Diseases, 56(5), 872–882. 10.1053/j.ajkd.2010.06.02720709438 PMC 3164880 · doi ↗ · pubmed ↗
4Chu C. D., Powe N. R., Mc Culloch C. E., Crews D. C., Han Y., Bragg-Gresham J. L., Saran R., Koyama A., Burrows N. R., Tuot D. S. (2021). Trends in Chronic Kidney Disease Care in the US by Race and Ethnicity, 2012–2019. JAMA Netw Open 4(9):e 2127014. doi:10.1001/jamanetworkopen.2021.2701434570204 PMC 8477264 · doi ↗ · pubmed ↗
5Divine G. W., Norton H. J., Barón A. E., & Juarez-Colunga E. (2018). The Wilcoxon–Mann–Whitney Procedure Fails as a Test of Medians. The American Statistician, 72(3), 278–286. 10.1080/00031305.2017.1305291 · doi ↗
6Fay M. P., Brittain E. H., Shih J. H., Follmann D. A., & Gabriel E. E. (2018). Causal estimands and confidence intervals associated with Wilcoxon-Mann-Whitney tests in randomized experiments. Stat Med, 37(20), 2923–2937. doi:10.1002/sim.779929774591 PMC 6373726 · doi ↗ · pubmed ↗
7Fox C. S., Matsushita K., Woodward M., Bilo H. J., Chalmers J., Heerspink H. J., Lee B. J., Perkins R. M., Rossing P., Sairenchi T., Tonelli M., Vassalotti J. A., Yamagishi K., Coresh J., de Jong P. E., Wen C. P., Nelson R. G., & Chronic Kidney Disease Prognosis Consortium. (2012). Associations of kidney disease measures with mortality and end-stage renal disease in individuals with and without diabetes: A meta-analysis. The Lancet, 380(9854), 1662–1673. 10.1016/S 0140-6736(12)61350-6 · doi ↗
8Gerstein H. C., Mann J. F., Yi Q., Zinman, Dinneen SF, Hoogwerf B, HalléJP, Young J, Rashkow A, Joyce C, Nawaz S, Yusuf S; HOPE Study Investigators (2001). Albuminuria and risk of cardiovascular events, death, and heart failure in diabetic and nondiabetic individuals. JAMA: The Journal of the American Medical Association, 286(4), 421–426. 10.1001/jama.286.4.42111466120 · doi ↗ · pubmed ↗