The power of small initialization in noisy low-tubal-rank tensor recovery

ZHiyu Liu; Haobo Geng; Xudong Wang; Yandong Tang; Zhi Han; Yao Wang

arXiv:2603.02729·cs.LG·March 4, 2026

The power of small initialization in noisy low-tubal-rank tensor recovery

ZHiyu Liu, Haobo Geng, Xudong Wang, Yandong Tang, Zhi Han, Yao Wang

PDF

Open Access 3 Reviews

TL;DR

This paper demonstrates that small initialization in factorized gradient descent significantly improves low-tubal-rank tensor recovery from noisy measurements, achieving near-optimal error bounds regardless of overestimated tubal-rank.

Contribution

It introduces a four-stage analysis showing small initialization enables nearly minimax optimal recovery error in noisy tensor recovery, independent of over-parameterization.

Findings

01

Small initialization improves recovery accuracy in noisy tensor problems.

02

Theoretical error bounds are independent of overestimated tubal-rank R.

03

Early stopping strategies can achieve optimal practical results.

Abstract

We study the problem of recovering a low-tubal-rank tensor $X_⋆ \in R^{n \times n \times k}$ from noisy linear measurements under the t-product framework. A widely adopted strategy involves factorizing the optimization variable as $U * U^{⊤}$ , where $U \in R^{n \times R \times k}$ , followed by applying factorized gradient descent (FGD) to solve the resulting optimization problem. Since the tubal-rank $r$ of the underlying tensor $X_{⋆}$ is typically unknown, this method often assumes $r < R \leq n$ , a regime known as over-parameterization. However, when the measurements are corrupted by some dense noise (e.g., Gaussian noise), FGD with the commonly used spectral initialization yields a recovery error that grows linearly with the over-estimated tubal-rank $R$ . To address this issue, we show that using a small…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

The key findings of this paper, that the estimation error of FGD only depends on the true tubal-rank $r$, and that small initialization overcomes the dependency on the estimated rank $R$, are definitely interesting. The authors also compare their work with earlier results on the same topic to highlight their contributions, and include a proof sketch section to illustrate why FGD with small initialization works, which I really appreciate.

Weaknesses

(i) The entire analysis replies on the T-PSD assumption on $X_*$, i.e., $X_*$ has the decomposition $X_* = U * U_*$. This is a significant simplification. The authors also acknowledge this limitation in Remark 5 and discussed a bit about extensions to general asymmetric case in Appendix I. (ii) Although the error bounds in Theorem 2 does not depend on $R$, the initialization scale $\alpha$ and early stopping time $t$ depends on $R$. Further, in case 3 it seems to me that we can take $R$ to be so

Reviewer 02Rating 6Confidence 4

Strengths

- The optimization dynamics are interesting and not so intuitive. In particular, the authors find that small random initialization is better than smart spectral initialization for this problem. Further, FGD exhibits non-monotonicity with respect to error with the ground-truth; therefore, early stopping is needed. - The technical strength of analysis in this paper seems impressive. - The authors confirm their results also through numerical simulations. Random small initialization and early

Weaknesses

- It is unclear whether the factored form of the tensor used by the authors imposes symmetry and/or psd constraints. The authors use $\mathcal{U} \ast \mathcal{U}^{\top}$ as their ansatz for the low tubal rank tensor. - The relation to low rank matrix recovery (i.e., with matrices and matrix SVD, rather than with tensors and t-SVD) is not well-specified as far as I can tell. The authors should comment on this. - There are no real data experiments. Real data would be nice, because there are

Reviewer 03Rating 2Confidence 4

Strengths

The main contribution of this paper lies in providing an error bound for low-tubal-rank tensor recovery when the tubal rank is overestimated in noisy settings, showing that FGD can still achieve reliable recovery.

Weaknesses

1. The contribution of this work is incremental. Factorized Gradient Descent (FGD) is not an original contribution of this work (see [1]). Similarly, the use of early stopping to avoid overfitting is a standard practice and cannot be regarded as an innovation here. [1] Z. Liu, Z. Han, Y. Tang, X. -L. Zhao and Y. Wang, "Low-Tubal-Rank Tensor Recovery via Factorized Gradient Descent," in IEEE Transactions on Signal Processing, vol. 72, pp. 5470-5483, 2024 2. The relationship with [1] should be d

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Sparse and Compressive Sensing Techniques · Seismic Imaging and Inversion Techniques