Robust Low-Complexity WMMSE Precoding Under Imperfect CSI with Per-Antenna Power Constraints

Zijiao Guo; Vaskar Sen; Honggui Deng

PMC · DOI:10.3390/s26010159·December 25, 2025

Robust Low-Complexity WMMSE Precoding Under Imperfect CSI with Per-Antenna Power Constraints

Zijiao Guo, Vaskar Sen, Honggui Deng

PDF

Open Access

TL;DR

This paper introduces a new low-complexity WMMSE precoding method for massive MU-MIMO systems that handles imperfect CSI and per-antenna power constraints efficiently.

Contribution

The novel RLC-WMMSE framework reduces computational complexity while maintaining performance under imperfect CSI and PAPCs.

Findings

01

The RLC-WMMSE algorithm achieves WSR performance close to standard WMMSE-PAPCs designs.

02

The proposed method significantly reduces runtime and strictly satisfies per-antenna power constraints.

03

Simulations show effectiveness over i.i.d. and spatially correlated channels.

Abstract

Weighted sum-rate (WSR) maximization in downlink massive multi-user multiple-input (MU-MIMO) with per-antenna power constraints (PAPCs) and imperfect channel state information (CSI) is computationally challenging. Classical weighted minimum mean-square error (WMMSE) algorithms, in particular, have per-iteration costs that scale cubically with the number of base-station antennas. This article proposes a robust low-complexity WMMSE-based precoding framework (RLC-WMMSE) tailored for massive MU-MIMO downlink under PAPCs and stochastic CSI mismatch. The algorithm retains the standard WMMSE structure but incorporates three key enhancements: a diagonal dual-regularization scheme that enforces PAPCs via a lightweight projected dual ascent with row-wise safety projection; a Woodbury-based transmit update that replaces the dominant M×M inversion with an (NK)×(NK) symmetric positive-definite…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

PAPC

Diseases4

CSI injury to SPD RLC

Figures13

Click any figure to enlarge with its caption.

Keywords

massive MU-MIMOweighted MMSEper-antenna power constraintsrobust precodingimperfect CSI

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced MIMO Systems Optimization · Advanced Wireless Communication Techniques · Sparse and Compressive Sensing Techniques

Full text

1. Introduction

Massive MU-MIMO systems are extensively recognized as a fundamental technology for fifth-generation (5G) and subsequent wireless networks. This is due to their ability to significantly spectral efficiency, reliability, and overall network capacity [1,2,3,4,5]. To fully realize these benefits in the downlink, effective precoding is essential to manage multi-user interference. Among linear precoding approaches, the weighted minimum mean-square error (WMMSE) method has gained prominence for achieving near-optimal weighted sum-rate performance in real-world implementations [6,7,8]. Nevertheless, the conventional WMMSE algorithm relies on repeated high-dimensional matrix inversions each iteration, resulting in computational complexity that scales cubically with the number of base station antennas [9,10]. This substantial computational burden challenges its practical deployment in large-scale massive MU-MIMO systems.

In practice, limited pilot resources and severe channel aging in massive MU-MIMO systems, particularly in high-mobility scenarios, make it impractical for the transmitter to obtain accurate channel state information (CSI) for precoding [11], thereby motivating research on robust precoding under imperfect CSI. The imperfect CSI is commonly modeled using a Gaussian-distributed posterior channel model [11,12,13]. Based on this model, robust precoding design can be formulated as an ergodic weighted sum-rate (EWSR) maximization problem subject to per-antenna power constraints (PAPCs). Several iterative robust precoding algorithms have been proposed to address this problem and have demonstrated excellent performance. Furthermore, in many practical BS implementations, each antenna is driven by a separate power amplifier with its own maximum rating [14], so per-antenna power constraints (PAPCs) are often more realistic than a single sum-power constraint (SPC). A simple engineering workaround is to design the precoder under an SPC and then heuristically rescale the columns or rows to satisfy the PAPCs [15], which may cause noticeable WSR loss. To address this, several works have studied WSR maximization directly under PAPCs. For example, zero-forcing (ZF) precoders with per-antenna constraints were analyzed in [16,17,18], but such schemes do not fully exploit the WSR objective because they do not jointly optimize receive filters and user weights. However, many existing approaches rely on fixed approximations or simplified update rules, which can induce non-negligible rate loss and poorer convergence behavior in large-scale regimes [19].

The weighted sum-rate (WSR) maximization problem for the downlink under a sum-power constraint (SPC) is nonconvex and NP-hard [20,21,22]. To address this challenge, the classical weighted minimum mean-square error (WMMSE) algorithm [23] provides an iterative solution by exploiting the fundamental relationship between the mean-square error (MSE) and the signal-to-interference-plus-noise ratio (SINR). This method reformulates WSR maximization as a weighted sum-MSE minimization problem and applies block coordinate descent (BCD) [24], resulting in three closed-form updates for the receive filters, weight matrices, and precoders that converge to a stationary point of the original WSR problem. To alleviate the computational burden, recent research has developed low-complexity variants that approximate WMMSE performance with reduced cost. For example, the rethinking WMMSE (R-WMMSE) algorithm [20] reduces complexity via randomized sketching (data dimensionality reduction). In contrast, our previously proposed LC-WMMSE scheme [25] leverages the structure of the WMMSE update through a Woodbury reformulation combined with a diagonal weight surrogate. While R-WMMSE introduces a probabilistic approximation error due to sketching, LC-WMMSE is deterministic and exhibits per-iteration complexity that scales primarily with the number of data streams rather than the number of BS antennas. Additionally, LC-WMMSE incorporates a hybrid switching mechanism that adaptively blends classical WMMSE updates with lightweight approximations, alongside an adaptive damping strategy that stabilizes the precoder trajectory and accelerates convergence.

Most existing robust and PAPC-aware designs still inherit the high-order matrix inversions of classical WMMSE or rely on generic convex solvers, which become prohibitive for massive MU-MIMO, especially under stochastic CSI mismatch. In contrast, this article proposes a robust low-complexity WMMSE-based framework (RLC-WMMSE) that simultaneously (i) handles imperfect CSI, (ii) enforces PAPCs, and (iii) maintains a low per-iteration complexity. The proposed RLC-WMMSE enforces PAPCs through a lightweight diagonal dual-regularization scheme while keeping the main transmit update in a $[eqn]$ symmetric positive-definite (SPD) form via a Woodbury identity, so that the dominant cost scales with $[eqn]$ rather than M. This makes the method particularly suitable for large-scale MU-MIMO with per-antenna power limits and imperfect CSI. In summary, the main contributions of this paper are as follows:

We extend the classical WMMSE framework to handle per-antenna power constraints (PAPCs) by introducing a diagonal dual regularization. A diminishing-step projected dual ascent is used to update the per-antenna Lagrange multipliers, and a final row-wise feasibility projection is applied to eliminate any residual PAPC violations. This design keeps all additional variables local to the BS and yields a PAPC-feasible precoder.
We develop a Woodbury-based reformulation of the transmit update that trades the classical $[eqn]$ inversion for an $[eqn]$ symmetric positive-definite (SPD) solve built from a block-diagonal surrogate weight. This reduces the dominant per-iteration complexity in the massive-MIMO regime $[eqn]$ , while retaining a numerically stable Cholesky/LDL implementation.
We propose a robust low-complexity WMMSE (RLC-WMMSE) update that blends the classical WMMSE precoder with its low-complexity surrogate through an adaptive mixing factor driven by the per-iteration MSE variation. Together with an Armijo-type adaptive damping rule, this hybrid scheme stabilizes the iterations when updates are computed on $[eqn]$ but performance is evaluated on the true channels $[eqn]$ , and enforces sufficient ascent of the monitored WSR in practice.
Simulation results indicate that the proposed RLC-WMMSE becomes cheaper per iteration than classical WMMSE once $[eqn]$ for typical system dimensions. In addition, for the PAPC dual loop, we establish a feasibility bound of the form $[eqn]$ , quantifying how the expected PAPC violation decays with the number of outer iterations T.
Through Monte Carlo simulations with both Kronecker-correlated channels and i.i.d. Rayleigh fading, and under various CSI mismatch levels, we show that the proposed RLC-WMMSE achieves near-identical WSR to the WMMSE-PAPCs benchmark while maintaining negligible PAPC violations. At the same time, it exhibits favorable runtime scaling with M and clear speedups over classical WMMSE in the large-array regime.

Table 1 summarizes the main design principles of three WMMSE-based precoders discussed in this work. The R-WMMSE approach [20] reduces cost via randomized sketching, solving compressed normal equations. Conversely, the LC-WMMSE method [25] instead exploits problem structure using a Woodbury reformulation and a diagonal-weight surrogate in the transmit step. Our RLC-WMMSE extends LC-WMMSE by adding diagonal dual regularization to handle PAPCs under imperfect CSI, while maintaining an $[eqn]$ symmetric positive-definite (SPD) solve for the transmit update. The three methods thus differ in update dimension, dominant cost, constraint handling, and approximation source.

The remainder of this article is summarized as follows. Section 2 introduces the system model and problem formulation. Section 3 presents the proposed robust low-complexity WMMSE (RLC-WMMSE) algorithm, including detailed derivations and a complexity analysis. Simulation results are provided and discussed in Section 4. Finally, Section 5 concludes the paper.

2. System Model and Problem Formulation

2.1. Downlink System Model

We consider a single-cell downlink MU-MIMO system in which a BS with M transmit antennas serves K users, each equipped with N receive antennas. The downlink channel to user k is denoted by $[eqn]$ , $[eqn]$ , and we model its entries as i.i.d. circularly symmetric complex Gaussian random variables (Rayleigh fading). The transmitted signal is

[eqn]

where $[eqn]$ is the linear precoder for user k and $[eqn]$ is the data symbol vector satisfying $[eqn]$ .

Under flat fading, the received signal at user k is

[eqn]

where $[eqn]$ represents additive white Gaussian noise (AWGN). The data vectors $[eqn]$ are mutually independent and also independent of the noise vectors $[eqn]$ .

Remark 1(Scalability in Massive MIMO). Practical massive MU-MIMO deployments typically satisfy $[eqn]$ , meaning the base station is equipped with far more antennas than each user [26]. In such configurations, classical algorithms such as WMMSE require large matrix inversions whose computational cost grows rapidly with M. To alleviate this burden, our earlier LC-WMMSE scheme [25] integrates hybrid switching and adaptive damping mechanisms, significantly lowering the complexity. These enhancements enable the algorithm to scale efficiently with the BS antenna count, attaining approximately sub-cubic complexity in large-scale regimes.

2.2. Problem Formulation with Imperfect CSI and PAPCs

In downlink MU-MIMO systems, a core design goal is to determine the set of precoders $[eqn]$ that maximize the weighted sum-rate (WSR) under a transmit power constraint. Here, $[eqn]$ represents the weight assigned to user k. The WSR is defined as

[eqn]

where the achievable rate of user k is given by

[eqn]

where the interference-plus-noise covariance matrix for user k is expressed as

[eqn]

We maximize the WSR over all feasible precoders under per-antenna power constraints (PAPCs), which introduce a different feasibility set and a distinct performance–complexity trade-off compared with the conventional sum-power constraint.

The WSR maximization problem under PAPCs can be expressed as

[eqn]

where $[eqn]$ donates the m-th diagonal element of matrix $[eqn]$ . The constraints in Equation (6) ensure that the transmit power at the m-th antenna of the BS does not exceed $[eqn]$ . Solving the WSR maximization problem in Equation (6) is challenging due to the objective function being highly nonlinear and nonconvex. Furthermore, as shown in [22], this problem is NP-hard, as summarized in the following proposition.

Proposition 1(NP-hardness of WSR Maximization). The downlink weighted sum-rate maximization problem under a sum-power constraint is known to be NP-hard [21,22]. Since PAPCs are more restrictive than an SPC, the PAPC formulation (6) is also NP-hard.

We adopt the standard imperfect-CSI model $[eqn]$ , $[eqn]$ , $[eqn]$ where $[eqn]$ is the channel estimate and $[eqn]$ sets the estimation NMSE; we use $[eqn]$ . Given nonnegative user weights $[eqn]$ , the weighted sum-rate (WSR) for a channel realization is $[eqn]$ , with $[eqn]$ and $[eqn]$ defined in Equations (4) and (5).

Imperfect CSI model: The additive Gaussian CSIT error model above is widely used in robust precoding/beamforming because it is tractable and captures the aggregate impact of estimation noise, quantization, and residual uncertainty; see, e.g., [11,12,13].

Remark 2(CSI outdatedness). In practice, CSIT can also be outdated due to feedback/processing delay, which is often modeled by a first-order Gauss–Markov evolution $[eqn]$ [2,27]. In this work, we focus on the additive error model for algorithm design and analysis, and in simulations, we sweep ϵ to represent different CSI qualities.

Under per-antenna power constraints (PAPCs), the feasible precoder set is

[eqn]

where $[eqn]$ collects the per-antenna budgets. For apples-to-apples comparisons with the SPC case, we set $[eqn]$ .

A practical “plug-in” (mismatch) design maximizes the WSR built on the channel estimates $[eqn]$ while enforcing PAPCs:

[eqn]

with $[eqn]$ the $[eqn]$ identity. Problem Equation (8) is nonconvex and NP-hard. We solve it via alternating WMMSE updates on $[eqn]$ combined with a dual-based enforcement of Equation (7) and a Woodbury low-complexity step; the final performance is reported by evaluating $[eqn]$ on the true channel. All symbols used in this paper are summarized in Table 2.

3. Proposed Robust LC-WMMSE Algorithm

3.1. The Classical WMMSE Algorithm

The WMMSE framework is widely used for WSR optimization [20]. In this section, we briefly revisit the classical WMMSE approach [23,28] from an optimization perspective, where the MSE serves as an auxiliary variable rather than a physical metric. Since the WSR problem in (6) is nonconvex, we leverage the well-known rate–MSE equivalence to rewrite it as a weighted sum-MSE minimization, which admits an efficient BCD procedure [24]; the problem can be reformulated as

[eqn]

subject to the same transmit power constraint specified in Equation (9). Here, $[eqn]$ denotes the priority weight associated with user k, and the mean-square error (MSE) matrix $[eqn]$ for user k is defined as

[eqn]

where $[eqn]$ denotes the user-k mean-square error (MSE) matrix. Here, $[eqn]$ is the linear receive filter and $[eqn]$ is the MSE weight matrix. Both variables are updated jointly with the precoders $[eqn]$ . By expanding $[eqn]$ , we obtain

[eqn]

As shown in [23], the reformulated problem in Equation (9) is nonconvex in the joint variable set $[eqn]$ . Nonetheless, it is convex in each variable block with the others fixed, which enables an efficient alternating optimization scheme summarized below:

[eqn]

Keeping the remaining two variable blocks unchanged, the weight matrix $[eqn]$ admits the following closed-form update:

[eqn]

For given receive filters $[eqn]$ and weight matrices $[eqn]$ , the precoders $[eqn]$ are updated by the solution of a convex quadratic problem:

[eqn]

with the corresponding normal equation $[eqn]$ .

[eqn]

[eqn]

Here, $[eqn]$ , $[eqn]$ , $[eqn]$ , and $[eqn]$ . Using the definitions of $[eqn]$ and $[eqn]$ in (15) and (16), the optimality condition $[eqn]$ gives the unique solution $[eqn]$ , which can be written per user as

[eqn]

While $[eqn]$ is shared by all users, the blocks on the right-hand side, $[eqn]$ , depend on k; consequently, the precoder blocks $[eqn]$ are user-specific. Since $[eqn]$ and $[eqn]$ in (13), $[eqn]$ (Hermitian), and the associated system admits a unique solution. Now, we replace the SPC update in Equation (17) by a per-antenna constrained step; for a dual vector $[eqn]$ , define

[eqn]

and choose $[eqn]$ so that each antenna power satisfies

[eqn]

A projected dual update is

[eqn]

If a tiny residual remains, apply the row-wise safety projection

[eqn]

Remark 3. If $[eqn]$ and $[eqn]$ , Equation (18) reduces to the SPC form in Equation (17).

The classical WMMSE algorithm requires solving an $[eqn]$ linear system in every outer iteration, which entails cubic work $[eqn]$ and becomes the dominant bottleneck when M is large. With per-antenna constraints the transmit update must also be regularized by a diagonal dual term, but the cost is still governed by the $[eqn]$ solve. This motivates a low-complexity variant that avoids the big inversion. In the next subsection, we construct a Woodbury update that shifts the solve to an $[eqn]$ system and enforces PAPCs with a light projected dual loop, yielding a favorable crossover when $[eqn]$ while preserving the WSR performance under CSI mismatch.

3.2. Proposed RLC-WMMSE

In this subsection, we develop an efficient iterative solution to the PAPC-constrained WSR maximization problem in Equation (8) using the WMMSE framework. The proposed robust low-complexity WMMSE (RLC-WMMSE) alternates standard WMMSE updates (on $[eqn]$ ) with a light dual update to enforce PAPCs and a Woodbury-based transmit step; hybrid switching and adaptive damping are employed for stability under CSI mismatch.

The RLC-WMMSE Algorithm Problem Reformulation

In the proposed robust LC-WMMSE, the $[eqn]$ matrix inversion in (15)–(17) is avoided by applying the Woodbury identity [29], which reduces the update to solving an $[eqn]$ system. As a result, the dominant per-iteration complexity decreases from $[eqn]$ to $[eqn]$ in the massive-MIMO setting $[eqn]$ .

Update the hybrid transmit precoder: At iteration t, we construct $[eqn]$ by convexly combining the classical WMMSE precoder $[eqn]$ and the LC precoder $[eqn]$ as follows:

[eqn]

where $[eqn]$ is an adaptive switch:

[eqn]

with a small smoothing constant $[eqn]$ (we use $[eqn]$ unless stated otherwise). When $[eqn]$ is large (far from a fixed point), $[eqn]$ favors $[eqn]$ ; near convergence, it gradually shifts to the LC update to save time.Diagonal weight and Woodbury build.We approximate the full WMMSE weight by its diagonal,

[eqn]

which preserves positive-definiteness while removing inter-stream couplings. Using $[eqn]$ and $[eqn]$ , we form the block diagonal matrix

[eqn]

and stack the channels horizontally,

[eqn]

PAPC enforcement via diagonal dual regularization.Under per-antenna power constraints, we replace the global Frobenius normalization by a diagonal dual regularization. At iteration t, construct $[eqn]$ and $[eqn]$ from the current $[eqn]$ as in (27) and (28). Let

[eqn]

[eqn]

With these, the LC step inverts an $[eqn]$ SPD matrix (via Cholesky) instead of an $[eqn]$ system, yielding the usual Woodbury speedup when $[eqn]$ . For a dual vector $[eqn]$ , define

[eqn]

and choose $[eqn]$ so that each antenna satisfies the PAPC $[eqn]$ , $[eqn]$ . A lightweight projected dual ascent that works well in practice is

[eqn]

with $[eqn]$ denoting projection onto $[eqn]$ and $[eqn]$ . If a tiny residual remains after the inner loop, apply the row-wise safety projection

[eqn]

The row scaling in (31) is not an ad-hoc post-processing; it is the exact row-wise Euclidean projection onto the PAPC-feasible set $[eqn]$ , since the constraints decouple across rows. Concretely, for any $[eqn]$ , $[eqn]$ is obtained by $[eqn]$ . Thus, every iterate is feasible (up to numerical tolerance), and Armijo acceptance is evaluated on the projected (feasible) iterate.Notes. (i) Equations (29)–(31) enforce PAPCs without any global Frobenius rescaling, so this section supersedes the SPC normalization previously used after Equation (28). (ii) If $[eqn]$ and $[eqn]$ , Equation (29) reduces to the SPC form (recovering the classical update).Adaptive Damping with Robust PAPCs: To stabilize the outer iterations under CSI mismatch, we adopt the adaptive damping mechanism originally proposed for the LC-WMMSE algorithm in [25] and extend it to the robust PAPC setting. Let $[eqn]$ denote the estimated WSR evaluated on the imperfect CSI $[eqn]$ . At iteration t, we measure the change in the estimated WSR as

[eqn]

We then choose a mixing factor

[eqn]

where $[eqn]$ and typical values are $[eqn]$ , $[eqn]$ , and $[eqn]$ . Let $[eqn]$ denote the undamped precoder returned by the robust LC-WMMSE (RLC) update at iteration t. The damped update is

[eqn]

Optionally, we apply a short Armijo backtracking (up to a few trials) on $[eqn]$ to enforce monotone ascent of the estimated objective, i.e., $[eqn]$ . This schedule reduces the step size when the iterate is far from stationarity (large $[eqn]$ ) and allows larger steps near convergence. Final performance is always reported as $[eqn]$ on the true channel $[eqn]$ for fairness.

3.3. Proposed RLC-WMMSE Updates Precoder with PAPCs and Imperfect CSI

The proposed robust low-complexity WMMSE (RLC-WMMSE) precoding algorithm follows the standard three-block WMMSE structure, but computes all updates with the channel estimates $[eqn]$ and enforces per-antenna power constraints (PAPCs) via the diagonal dual map described in Section 3.2.

Receive Filter Update $[eqn]$ : At iteration t, the receive filter for user k is

[eqn]

where the BS transmit covariance from the previous iterate is $[eqn]$ $[eqn]$ . The matrix inside the inverse is Hermitian positive-definite, so Equation (35) is well posed.Weight Matrix Update $[eqn]$ : With the MSE matrix $[eqn]$ (defined earlier), we set

[eqn]

which yields a positive-definite weight; in the LC branch, we use its diagonal surrogate as shown in Equations (24)–(28); we form $[eqn]$ and stack $[eqn]$ to obtain the Woodbury update.Transmit Precoder Update $[eqn]$ : Using the block-diagonal matrix $[eqn]$ and the stacked channel $[eqn]$ (both defined earlier), the Woodbury step gives the LC precoder in closed form as

[eqn]

with $[eqn]$ and $[eqn]$ . This moves the inversion from size M to size $[eqn]$ , yielding per-iteration cost $[eqn]$ instead of $[eqn]$ when $[eqn]$ .

PAPC Enforcement (Replaces SPC Rescaling)

Rather than the global Frobenius normalization, we enforce PAPCs via the diagonal dual regularization described in Section 3.2: the classical and LC right-hand sides are evaluated through $[eqn]$ Equation (18), where $[eqn]$ is updated by the projected dual step in Equation (20); if a tiny residual remains, we apply the row-wise safety projection in Equation (21). This entirely replaces the SPC normalization.

3.4. Convergence and Feasibility Analysis

The classical WMMSE alternates minimization of a convex quadratic surrogate and achieves monotone ascent of the WSR [23]. In our robust low-complexity variant, at iteration t, we form the transmit quadratic $[eqn]$ , from the pairs $[eqn]$ computed on the estimates $[eqn]$ , and obtain a candidate update $[eqn]$ , the LC candidate uses the diagonal surrogate with $[eqn]$ . We then apply the damped step

[eqn]

where $[eqn]$ is chosen by Armijo backtracking on the true-channel objective $[eqn]$ evaluated at the feasible iterate $[eqn]$ .

Proposition 2(Monotone WSR and limit-point stationarity). With Armijo acceptance evaluated on the feasible iterate $[eqn]$ , the objective sequence $[eqn]$ is nondecreasing and bounded above; hence, it converges. Moreover, the overall procedure is an inexact block-coordinate method: $[eqn]$ and $[eqn]$ are updated in closed form, while the transmit update may be inexact due to (i) the diagonal surrogate and (ii) the PAPC dual loop. If the transmit inexactness vanishes asymptotically (e.g., $[eqn]$ , or the hybrid rule selects $[eqn]$ increasingly often), then every accumulation point of $[eqn]$ is a stationary point of the classical WMMSE objective. Otherwise, accumulation points are stationary for the corresponding surrogate objective using $[eqn]$ . Sketch. Armijo yields sufficient ascent of $[eqn]$ for accepted feasible iterates, and boundedness follows from PAPCs. Standard inexact-BCD arguments imply convergence of the objective values and stationarity of limit points under vanishing inexactness.

Remark 4(Practical convergence rate and hybrid impact). To quantify practical convergence speed and the impact of hybrid switching, we report (i) WSR versus outer iteration and (ii) an iterations-to-target metric (e.g., the number of outer iterations required to reach $[eqn]$ of the final WSR) under mild and strong spatial correlation. These results complement Proposition 2 by providing an empirical convergence characterization under CSI mismatch.

To quantify the impact of the proposed hybrid switching mechanism on convergence under PAPCs and CSI mismatch, we run the same robust algorithm in three modes: hybrid switching, LC-only by fixing $[eqn]$ , and WMMSE-only by fixing $[eqn]$ . Figure 1 reports the averaged WSR trajectories versus iteration at $[eqn]$ dB over spatially correlated channels ( $[eqn]$ ) with CSI error variance $[eqn]$ , averaged over 100 Monte Carlo trials. All three variants converge to essentially the same WSR plateau, indicating that hybrid switching does not compromise the achieved performance in the considered robust setting. The numerical summary is provided in Table 3. The final WSR values are $[eqn]$ , $[eqn]$ , and $[eqn]$ bps/Hz for hybrid, LC-only, and WMMSE-only, respectively. Moreover, the hybrid mode reaches $[eqn]$ of its final WSR within 12 iterations, whereas LC-only requires 15 iterations, demonstrating faster practical convergence (relative to the pure LC branch) while preserving stability under imperfect CSI and PAPCs.

Lemma 1(Diagonal-surrogate perturbation bound). Let $[eqn]$ and $[eqn]$ . The diagonal surrogate perturbs the transmit normal matrix by $[eqn]$ , which satisfies $[eqn]$ Hence, the diagonal approximation is accurate when the off-diagonal energy $[eqn]$ is small (diagonal dominance), and it can degrade in strongly coupled regimes (e.g., high SNR / strong correlation), motivating the hybrid safeguard. Implication: Since $[eqn]$ scales with $[eqn]$ , the diagonal surrogate may be less reliable under strong correlation/high SNR, and the hybrid switching mitigates this by favoring the classical update when needed.

Diagonal approximation error: To quantify the error introduced by replacing $[eqn]$ with $[eqn]$ , we use the relative off-diagonal energy

[eqn]

which measures the departure of $[eqn]$ from a diagonal structure. Empirically, $[eqn]$ increases under stronger spatial correlation, indicating a less accurate diagonal surrogate.

Moreover, $[eqn]$ admits an explicit perturbation bound. Let $[eqn]$ and $[eqn]$ . Then,

[eqn]

so smaller $[eqn]$ implies a smaller surrogate perturbation. The hybrid rule further improves robustness by favoring the classical update when the surrogate is less reliable.

PAPC feasibility (every outer iteration): Let $[eqn]$ and update $[eqn]$ by the projected dual step Equations (29) and (30) with the final row-wise safety projection Equation (31). Then, each iterate $[eqn]$ satisfies $[eqn]$ for all m (to numerical tolerance), so PAPCs hold throughout.

3.5. Computational Complexity Analysis

In massive MU-MIMO, the computational burden of iterative precoding is often the limiting factor. In particular, the classical WMMSE [23] is dominated per iteration by solving the $[eqn]$ precoder system in (27), i.e., factorizing $[eqn]$ , which costs $[eqn]$ . Receiver and weight updates each cost $[eqn]$ . Rethinking WMMSE (R-WMMSE) reduces the dominant cost via sketching and matrix-free linear solves and exhibits near-linear scaling in M empirically [20].

LC step (Woodbury): Using the stacked channel $[eqn]$ and $[eqn]$ with $[eqn]$ , the LC step replaces the $[eqn]$ inverse by $[eqn]$ SPD solve via Woodbury:

[eqn]

Algorithm 1 dominant costs per iteration are as follows: (i) Cholesky/solve of $[eqn]$ : $[eqn]$ ; (ii) Gram products with $[eqn]$ (e.g., $[eqn]$ , $[eqn]$ ): $[eqn]$ ; and (iii) per-user $[eqn]$ factorizations (for $[eqn]$ ): $[eqn]$ . Thus, Table 4 shows that the RLC-WMMSE algorithm has dominant cost $[eqn]$ , which is much smaller than the classical WMMSE algorithm complexity $[eqn]$ when $[eqn]$ . Algorithm 1 Robust low-complexity WMMSE (RLC-WMMSE) precoding.

Require: Channel estimates $[eqn]$ , true channels $[eqn]$ (for evaluation), weights $[eqn]$ , noise $[eqn]$ , PAPC budgets $[eqn]$ , max iters T, tol $[eqn]$
1:Initialize stacked precoder $[eqn]$ (feasible or normalized)
2:for $[eqn]$ to T do
3: for $[eqn]$ toK do
4: Receive update: $[eqn]$ by (35),
5: MSE build: $[eqn]$ from $[eqn]$ and $[eqn]$
6: Weight update: $[eqn]$ by (36),
7: end for
8: Hybrid switch: compute $[eqn]$ by (23),
9: Classical candidate (with PAPCs): build $[eqn]$ by (27); compute $[eqn]$ via $[eqn]$ with $[eqn]$ updated by the projected dual step (29) and (30) and the safety projection (31),
10: Low-complexity candidate (Woodbury + PAPCs): form $[eqn]$ and stacked $[eqn]$ by (25) and (26), compute $[eqn]$ using the Woodbury closed form by (37) while enforcing PAPCs by (29)–(31) on the LC branch.
11: Hybrid precoder update by (22),
12: Adaptive damping: compute $[eqn]$ by (33), then set $[eqn]$
13: Armijo acceptance is evaluated on the PAPC-feasible trial iterate.
14: Evaluate $[eqn]$ on true channels $[eqn]$ (monotone track); if $[eqn]$ then break
15:end for
Ensure: Per-user blocks $[eqn]$ from the stacked $[eqn]$

3.5.1. PAPC (RLC-WMMSE) Effect

With PAPCs, the classical branch evaluates $[eqn]$ using a short projected dual loop in Equations (29)–(31). In the LC branch, we replace $[eqn]$ by the diagonal matrix $[eqn]$ and apply Woodbury with $[eqn]$ and $[eqn]$ :

[eqn]

The per-iteration asymptotic costs remain $[eqn]$ ; the dual loop adds J cheap inner steps (typically $[eqn]$ ), each requiring one $[eqn]$ triangular solve and power check, i.e., an extra $[eqn]$ that is lower-order when $[eqn]$ .

Takeaway. Because $[eqn]$ in massive MU-MIMO, LC/RLC replace the cubic $[eqn]$ term with operations that scale with $[eqn]$ . Empirically, we observe a crossover where LC/RLC become faster than classical WMMSE when $[eqn]$ (with $[eqn]$ in our setup), and the PAPC dual loop incurs only a small constant overhead. Imperfect CSI (updates on $[eqn]$ ) does not change orders of complexity.

3.5.2. Practical Implementation and Latency Considerations

We emphasize that RLC-WMMSE remains an iterative optimizer and thus is not primarily intended to replace one-shot closed-form precoders in extremely stringent URLLC pipelines. Rather, it targets regimes in which a small iteration budget is feasible within the channel coherence time and the dominant computations can be efficiently parallelized. In practice, the latency can be reduced by warm-starting from the previous slot/TTI solution, adopting early stopping with a fixed iteration cap or an iterations-to-target criterion (e.g., $[eqn]$ ), and implementing the dominant $[eqn]$ SPD solves using highly optimized batched Cholesky/LDL routines on GPU/FPGA/ASIC. From this perspective, the key advantage of the proposed reformulation is that it shifts the dominant per-iteration cost from an $[eqn]$ inversion to an $[eqn]$ SPD solve, which is more favorable when $[eqn]$ .

3.6. Implementation Considerations and Signaling Overhead Analysis

At iteration t, the BS needs downlink (DL) CSI and the current receive filters and weights for each user. In a TDD deployment, the BS estimates the downlink channels $[eqn]$ from uplink pilots by reciprocity and can update $[eqn]$ and $[eqn]$ locally. Hence, no per-iteration DL feedback is required beyond the usual pilot overhead, and the additional PAPC dual variables $[eqn]$ are kept entirely at the BS. In an FDD system, the user equipments (UEs) estimate the channels from DL pilots and must feed back effective CSI based on $[eqn]$ for the RLC-WMMSE updates. A full WMMSE mode would return the receive filters $[eqn]$ and the Hermitian weight matrices $[eqn]$ . Using $[eqn]$ bits per complex number and $[eqn]$ bits per real number, the per-user payload per iteration is on the order of

[eqn]

where $[eqn]$ real entries describe a $[eqn]$ Hermitian matrix. Our LC branch suggests a lighter diagonal feedback mode. Since the algorithm only uses the diagonal surrogate $[eqn]$ , the UE can quantize and feed back $[eqn]$ together with a compressed representation of $[eqn]$ (e.g., a codebook index or a few dominant singular vectors). The overhead reduces to roughly

[eqn]

which can be significantly smaller than $[eqn]$ when $[eqn]$ . In both modes, the additional PAPC dual variables are updated at the BS and incur no extra DL/UL signaling.

Relation to imperfect CSI model: In our stochastic CSI model $[eqn]$ , the mismatch variance ε captures the combined effect of channel estimation noise and, in FDD, feedback quantization/compression and delay. Longer UL/DL pilot sequences or finer feedback quantization reduce ε, while aggressive compression in the LC feedback mode typically increases it. The proposed RLC-WMMSE algorithm is designed to be robust to such mismatches, as reflected in the simulations of Section 4.

4. Simulations and Results

4.1. Simulation Setup

We examine a single-cell massive MU-MIMO downlink where a BS with M transmit antennas serves K users. Each user is equipped with N receive antennas and is allocated $[eqn]$ data streams. Under the sum-power constraint (SPC), the BS transmit-power budget is set to $[eqn]$ . For per-antenna power constraints, the power limits are collected in $[eqn]$ with $[eqn]$ . The channel matrices $[eqn]$ are generated as circularly symmetric complex Gaussian fading with large-scale pathloss $[eqn]$ [30], where $[eqn]$ denotes the user–BS distance. The noise variance is identical for all users and is set to $[eqn]$ , so that $[eqn]$ (in dB) corresponds to the average received SNR in the absence of precoding.

Unless stated otherwise, results are reported under perfect CSI at the BS and are averaged over 100 independent channel realizations. For robustness evaluations, we explicitly model imperfect CSI via $[eqn]$ , where $[eqn]$ and $[eqn]$ controls the estimation NMSE. Throughout the simulations, we use $[eqn]$ , $[eqn]$ , $[eqn]$ , $[eqn]$ , tolerance $[eqn]$ , and a maximum of $[eqn]$ iterations. All experiments are implemented in Matlab R2024b on a Windows 11 (64-bit) workstation with an Intel i7-12700H CPU at 3.20 GHz, 16 GB RAM, and RTX Graphics.

4.2. Robust Low-Complexity (RLC-WMMSE) Performance Under Correlated Channels

This subsection presents simulation results that evaluate the performance of the proposed RLC-WMMSE algorithm under per-antenna power constraints and correlated-channel conditions. The method is compared with several baseline schemes: the LC-WMMSE algorithm [25], the classical WMMSE algorithm [23], the R-WMMSE approach [20], and two non-iterative precoding techniques—zero-forcing (ZF) [31] and block diagonalization (BD) [32]. The closed-form ZF and BD methods exploit low-dimensional channel properties (e.g., BD uses null-space projection to suppress interference) and exhibit low computational complexity, specifically $[eqn]$ for ZF and $[eqn]$ for BD [31,32]. Although these non-iterative schemes are computationally efficient and practical for massive MU-MIMO, they generally yield suboptimal weighted sum-rate performance. To ensure a fair comparison, each trial initializes the precoder as $[eqn]$ , which is then scaled to satisfy the power constraints and kept identical for all methods. To investigate robustness in realistic propagation environments, we first consider on spatially correlated channels at the BS. For user k, the true downlink channel is generated according to

[eqn]

where $[eqn]$ is the BS correlation matrix. We adopt an exponential model $[eqn]$ with $[eqn]$ and use $[eqn]$ in the correlated-channel experiments. The i.i.d. Rayleigh case is recovered by setting $[eqn]$ . Apart from the choice of $[eqn]$ , the simulation protocol (SNR grid, $[eqn]$ , initialization, tolerances, and power normalization) is identical for both correlated and i.i.d. channels.

Figure 2 shows the WSR versus SNR for correlated MU-MIMO channels with perfect CSI. The LC-WMMSE algorithm closely matches the WSR of the classical WMMSE algorithm over the whole SNR range, while both clearly outperform the R-WMMSE and the closed-form ZF and BD precoders.

Figure 3 and Figure 4 illustrate the convergence behavior of the different PAPC schemes for two system dimensions, $[eqn]$ and $[eqn]$ , at $[eqn]$ . For the SPC case, WMMSE, LC-WMMSE, and R-WMMSE all exhibit fast, monotone convergence and attain almost identical final WSR, with R-WMMSE and LC-WMMSE reaching the steady state in noticeably fewer iterations than classical WMMSE. Under PAPCs, the proposed robust LC-WMMSE converges much faster than the WMMSE-PAPCs baseline while achieving nearly the same steady-state WSR, confirming that the low-complexity Woodbury/dual design preserves good convergence properties even in the presence of per-antenna constraints.

Now, we compare our proposed RLC-WMMSE algorithm with the LC-WMMSE algorithm, the classical WMMSE and R-WMMSE schemes in terms of average CPU time to convergence, and quantify the additional overhead introduced by the PAPC variants (robust LC-WMMSE and WMMSE-PAPCs). Figure 5 shows the runtime versus the number of users K for a system with $[eqn]$ BS antennas, $[eqn]$ receive antennas per user, and $[eqn]$ dB. As K increases, all algorithms exhibit higher computational cost, but with markedly different slopes. The proposed RLC-WMMSE algorithm is consistently faster than the classical WMMSE baseline, showing a much flatter growth in runtime as K grows, which is consistent with its reduced per-iteration complexity. For the largest tested value $[eqn]$ , LC-WMMSE is roughly $[eqn]$ faster than WMMSE in terms of CPU time, whereas R-WMMSE, thanks to its randomized/sketched updates, achieves runtimes that are almost an order of magnitude smaller than those of WMMSE. The robust LC-WMMSE under PAPCs remains significantly cheaper than the exact WMMSE-PAPCs solver, confirming the benefit of the proposed dual-based low-complexity design for PAPCs.

Figure 6 shows the runtime as a function of the number of BS antennas M for $[eqn]$ users, $[eqn]$ receive antennas per user, and $[eqn]$ . The computational cost of all methods increases with M, but the scaling behavior differs. The classical WMMSE and WMMSE-PAPCs curves grow rapidly, reflecting the $[eqn]$ matrix inversions in each iteration. In contrast, LC-WMMSE and robust LC-WMMSE grow much more slowly with M, and for $[eqn]$ , the LC-WMMSE achieves about a $[eqn]$ speedup over WMMSE while remaining substantially faster than WMMSE-PAPCs. R-WMMSE exhibits the lowest runtime across all antenna dimensions, in line with its near-linear complexity in M. Overall, these results confirm the complexity analysis: LC-WMMSE provides a substantial reduction in computational cost compared with the classical WMMSE algorithm, and the robust LC-WMMSE inherits this favorable scaling even in the presence of PAPCs.

Figure 7 compares the weighted sum-rate versus SNR for the considered precoders with correlated channels ( $[eqn]$ , $[eqn]$ , $[eqn]$ , and imperfect CSI). As expected, all schemes benefit from increasing SNR, but their relative gaps remain almost constant. The benchmark WMMSE-PAPCs and its normalized variant achieve the highest WSR, and the proposed RLC-WMMSE closely tracks both curves across the entire SNR range. At $[eqn]$ dB, for example, the WSR loss of RLC-WMMSE with respect to WMMSE-PAPCs is below $[eqn]$ , and below $[eqn]$ relative to the normalized WMMSE-PAPCs baseline. R-WMMSE delivers slightly lower WSR due to the sketching approximation, whereas LC-WMMSE (SPC), ZF, and BD exhibit larger gaps, highlighting the importance of both PAPC enforcement and robust design under CSI mismatch.

Figure 8 reports the corresponding average runtimes per channel realization versus SNR. The runtimes of all algorithms are essentially insensitive to SNR and are dominated by their per-iteration matrix operations. R-WMMSE is the fastest method, but this comes at the cost of a noticeable WSR loss. LC-WMMSE reduces the runtime by about 20– $[eqn]$ compared to classical WMMSE, confirming the effectiveness of the Woodbury reformulation. The proposed RLC-WMMSE incurs only a moderate overhead relative to LC-WMMSE to enforce PAPCs and handle imperfect CSI, yet it still runs about 10– $[eqn]$ faster than the full WMMSE-PAPCs solver. Overall, these results demonstrate that RLC-WMMSE attains WSR performance essentially identical to the best PAPC baselines while offering more favorable runtime scaling than classical WMMSE and WMMSE-PAPCs in the large-array regime.

4.3. Robustness to Channel Estimation Errors

In this subsection we investigate the robustness of the proposed robust LC-WMMSE precoder to imperfect CSI at the BS. For each user k, the estimated downlink channel is modeled as $[eqn]$ , where $[eqn]$ is the true channel matrix and $[eqn]$ collects the estimation errors whose entries are i.i.d. circularly symmetric complex Gaussian. The quality of the CSI is parameterized by the normalized mean-squared error (NMSE)

[eqn]

so that $[eqn]$ corresponds to perfect CSI and larger values of $[eqn]$ represent increasingly severe estimation errors. For each fixed $[eqn]$ , we generate $[eqn]$ according to the above model and compare the following transceiver designs:

WMMSE (oracle, $[eqn]$ ): The classical WMMSE algorithm with a sum-power constraint, which has perfect knowledge of $[eqn]$ and serves as an upper bound.
WMMSE (mismatch, $[eqn]$ ): The classical WMMSE algorithm that designs the precoder using the imperfect CSI $[eqn]$ , while the achieved weighted sum-rate is always evaluated on the true channels $[eqn]$ .
Robust LC-WMMSE (PAPCs): The proposed low-complexity LC-WMMSE algorithm with per-antenna power constraints, which updates all variables using only $[eqn]$ and is also evaluated on the true channels $[eqn]$ .

Let $[eqn]$ , $[eqn]$ , and $[eqn]$ denote the average weighted sum-rate (WSR) of the oracle WMMSE, mismatched WMMSE, and robust LC-WMMSE, respectively, for a given value of $[eqn]$ . The relative WSR loss with respect to the oracle design is defined as

[eqn]

Figure 9 shows the WSR versus $[eqn]$ at $[eqn]$ dB. As expected, the oracle WMMSE curve is almost insensitive to $[eqn]$ , since it always uses the true CSI. In contrast, the mismatched WMMSE suffers a significant performance degradation as $[eqn]$ increases. The proposed robust LC-WMMSE (PAPCs) consistently tracks the oracle curve more closely, thereby mitigating the loss induced by imperfect CSI. The relative losses $[eqn]$ and $[eqn]$ are shown in Figure 10; these results confirm that the proposed design reduces the WSR loss compared to the mismatched WMMSE baseline over the entire range of channel-estimation errors considered.

To verify that robustness does not come at the expense of violating the per-antenna power constraints, we also monitor the average maximum PAPC violation

[eqn]

where $[eqn]$ denotes the mth row of the final precoder and $[eqn]$ . As illustrated in Figure 11, the proposed robust LC-WMMSE scheme satisfies the PAPCs up to numerical precision for all tested values of $[eqn]$ .

4.4. Performance Under i.i.d. Rayleigh

To provide a baseline reference, we first consider an uncorrelated i.i.d. Rayleigh fading scenario with imperfect CSI at the BS. The true downlink channels are drawn as $[eqn]$ , while the BS only has access to the estimates $[eqn]$ , where $[eqn]$ models the estimation error. In this subsection, we fix the NMSE parameter to $[eqn]$ (approximately $[eqn]$ dB channel-estimation NMSE), and set the transmit-side correlation coefficient to $[eqn]$ . Unless otherwise stated, we use $[eqn]$ BS antennas and $[eqn]$ users, each with $[eqn]$ receive antennas (and streams), and a total BS sum power $[eqn]$ W under the SPC case. For PAPC-based schemes, the per-antenna budgets are chosen as $[eqn]$ so that $[eqn]$ . All results are averaged over a large number of independent Monte Carlo channel realizations.

Figure 12 shows the weighted sum-rate versus SNR for the i.i.d. Rayleigh case with imperfect CSI ( $[eqn]$ , $[eqn]$ ). The proposed LC-WMMSE and robust LC-WMMSE (PAPCs) closely track the classical WMMSE and WMMSE-PAPCs benchmarks, respectively, across the entire SNR range.

Figure 13 reports the corresponding average runtime per channel realization, highlighting the significant complexity reduction of LC-WMMSE and robust LC-WMMSE compared with the WMMSE algorithm and WMMSE-PAPCs algorithm counterparts.

5. Conclusions

Weighted sum-rate maximization in downlink massive MU-MIMO with per-antenna power constraints (PAPCs) and imperfect CSI is computationally demanding, particularly for classical WMMSE algorithms whose cost scales cubically with the number of base-station antennas. This paper proposed a robust low-complexity WMMSE (RLC-WMMSE) precoding framework that preserves the favorable WSR performance of classical WMMSE while explicitly enforcing PAPCs under stochastic CSI mismatch. The approach combines a Woodbury-based low-complexity transmit update that operates on an $[eqn]$ SPD system, a hybrid switching rule with adaptive damping that blends classical and low-complexity updates, and a lightweight diagonal dual regularization with row-wise safety projection to satisfy per-antenna power limits.

A qualitative complexity analysis shows that the proposed RLC-WMMSE scheme becomes more efficient than classical WMMSE once the number of BS antennas significantly exceeds the total number of data streams, which is typical in massive MU-MIMO deployments. Extensive simulations over i.i.d. and correlated channels, and for various CSI mismatch levels, demonstrate that RLC-WMMSE attains WSR performance very close to a WMMSE-PAPCs benchmark, while maintaining negligible average PAPC violations and offering clearly favorable runtime scaling as the array size grows. The implementation and signaling discussion further indicates that robustness and PAPC feasibility can be handled entirely at the BS, with only mild feedback requirements on the user side.

Overall, the proposed RLC-WMMSE method provides a practical, feasibility-aware, and computationally efficient precoding option for large-array base stations with per-antenna power budgets and imperfect CSI. Future work will consider extensions to multi-cell coordination, hybrid analog–digital architectures, and more refined stochastic error models that capture channel aging and hardware impairments.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Wang M. Gao F. Jin S. Lin H. An Overview of Enhanced Massive MIMO with Array Signal Processing Techniques IEEE J. Sel. Top. Signal Process.20191388690110.1109/JSTSP.2019.2934931 · doi ↗
2Marzetta T.L. Larsson E.G. Yang H. Ngo H.Q. Fundamentals of Massive MIMO Cambridge University Press Cambridge, UK 2016
3Pereira de Figueiredo F.A. An Overview of Massive MIMO for 5G and 6GIEEE Lat. Am. Trans.20222093194010.1109/TLA.2022.9757375 · doi ↗
4Zhang J. Björnson E. Matthaiou M. Ng D.W.K. Yang H. Love D.J. Prospective Multiple Antenna Technologies for Beyond 5GIEEE J. Sel. Areas Commun.2020381637166010.1109/JSAC.2020.3000826 · doi ↗
5Li Q.C. Niu H. Papathanassiou A.T. Wu G. 5G Network Capacity: Key Elements and Technologies IEEE Veh. Technol. Mag.20149717810.1109/MVT.2013.2295070 · doi ↗
6Peng M. Sun Y. Li X. Mao Z. Wang C. Recent Advances in Cloud Radio Access Networks: System Architectures, Key Techniques, and Open Issues IEEE Commun. Surv. Tutor.2016182282230810.1109/COMST.2016.2548658 · doi ↗
7Sohrabi F. Nuzman C. Du J. Yang H. Viswanathan H. Energy-Efficient Flat Precoding for MIMO Systems IEEE Trans. Signal Process.20257379581010.1109/TSP.2025.3537960 · doi ↗
8Choi H. Swindlehurst A.L. Choi J. WMMSE-Based Rate Maximization for RIS-Assisted MU-MIMO Systems IEEE Trans. Commun.2024725194520810.1109/TCOMM.2024.3381707 · doi ↗