MatRL: Provably Generalizable Iterative Algorithm Discovery via Monte-Carlo Tree Search

Sungyoon Kim; Rajat Vadiraj Dwaraknath; Longling geng; Mert Pilanci

arXiv:2507.03833·cs.LG·July 17, 2025

MatRL: Provably Generalizable Iterative Algorithm Discovery via Monte-Carlo Tree Search

Sungyoon Kim, Rajat Vadiraj Dwaraknath, Longling geng, Mert Pilanci

PDF

Open Access 3 Reviews

TL;DR

MatRL is a reinforcement learning framework that automatically discovers and plans iterative algorithms for matrix functions, ensuring provable generalization and improved performance over existing methods.

Contribution

The paper introduces MatRL, a novel RL-based approach that automates the design of matrix iteration algorithms with theoretical guarantees of generalization.

Findings

01

MatRL outperforms baseline algorithms in numerical experiments.

02

Learned algorithms generalize to larger matrices from the same distribution.

03

The approach effectively combines Monte-Carlo tree search with reinforcement learning.

Abstract

Iterative methods for computing matrix functions have been extensively studied and their convergence speed can be significantly improved with the right tuning of parameters and by mixing different iteration types. Handtuning the design options for optimal performance can be cumbersome, especially in modern computing environments: numerous different classical iterations and their variants exist, each with non-trivial per-step cost and tuning parameters. To this end, we propose MatRL -- a reinforcement learning based framework that automatically discovers iterative algorithms for computing matrix functions. The key idea is to treat algorithm design as a sequential decision-making process. Monte-Carlo tree search is then used to plan a hybrid sequence of matrix iterations and step sizes, tailored to a specific input matrix distribution and computing environment. Moreover, we also show that…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

The paper is interesting. I'm pretty borderline on it because I'm not super clear how much "new science" there is in it; though if the paper is considered to be sufficiently novel then I don't really have any other concerns about its publication. I'm a bit of an outsider to RL as a community, so it's hard for me to judge how much this paper is really cool RL research (as opposed to like good engineering on top of standard RL tools). The motivation is compelling. We want to compute spectral fun

Weaknesses

The paper is confusing in places. The experiments are still weaker than I'd want them to be, in terms of validating a proposed mathematical algorithm on a wide variety of matrices. I don't really understand all the twists and turns in the precise engineering of the titular MatRL algorithm [page 6]. The pseudocode is a bit too abstract, and the importance of all the elements feels a bit vague to me. I wouldn't be surprised this feels vague to me because I'm not as familiar with the RL side of th

Reviewer 02Rating 6Confidence 3

Strengths

- The RL algorithm managed to identify algorithms with better performance on the examples presented in the paper.

Weaknesses

The training is applicable only when the state is reduced to the spectrum of the matrix and to functions within the Congruence Invariant Diagonal Preserving framework. This likely limits the general applicability of the resulting algorithms. The results presented in Section 4 on generalization appear rather weak, as they essentially concern only matrices with asymptotically similar spectra. The title therefore seems somewhat optimistic. A deeper investigation of the algorithms’ generalization p

Reviewer 03Rating 4Confidence 3

Strengths

* The paper considers the automated discovery of matrix algorithms, which has the potential to improve computational workloads by finding more efficient algorithms * Numerical experiments suggest that computational improvements can be achieved compared to existing baselines / state-of-the-art algorithms

Weaknesses

* The paper is, overall, hard to follow. It is easy to get lost in the various notations, and some terminologies are used without a clear definition * The proof of several key theoretical results is deferred to the appendix, which makes it hard to validate the findings * The experiments appear to be executed on a single matrix (see questions below), and the paper does not appear to discuss performance variability * Proposition 1 appears to be an existential result based what seem to be relativel

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFuzzy Logic and Control Systems · Neural Networks and Applications · Time Series Analysis and Forecasting