Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO
Jaeha Lee, Gio Huh, Ning Su, Tony Yue YU

TL;DR
This paper explores the use of transformers for discovering algebraic structures, specifically multivariate polynomial decomposition, introducing a new synthetic data pipeline, a rank-aware reinforcement learning method, and demonstrating improved accuracy and efficiency.
Contribution
It introduces BGRPO, a rank-aware reinforcement learning approach, and a synthetic data pipeline for training transformers on complex algebraic tasks, advancing symbolic reasoning capabilities.
Findings
Finetuning with BGRPO improves accuracy and reduces inference compute by 75%.
Transformer models outperform Mathematica in polynomial simplification tasks.
The synthetic data pipeline enables controlled complexity for training and evaluation.
Abstract
Recent efforts have extended the capabilities of transformers in logical reasoning and symbolic computations. In this work, we investigate their capacity for non-linear latent pattern discovery in the context of functional decomposition, focusing on the challenging algebraic task of multivariate polynomial decomposition. This problem, with widespread applications in science and engineering, is proved to be NP-hard, and demands both precision and insight. Our contributions are threefold: First, we develop a synthetic data generation pipeline providing fine-grained control over problem complexity. Second, we train transformer models via supervised learning and evaluate them across four key dimensions involving scaling behavior and generalizability. Third, we propose Beam Grouped Relative Policy Optimization (BGRPO), a rank-aware reinforcement learning method suitable for hard algebraic…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper claims to be the first to systematically explore transformers’ ability to uncover hidden nonlinear algebraic structures. 2. The experiments span multiple dimensions—including problem complexity, model architecture, distribution adaptation, and search strategies—offering a comprehensive and systematic evaluation.
1. The work is limited in scope, as it focuses solely on polynomial decomposition rather than broader symbolic reasoning or algebraic tasks. 2. The paper does not provide a clear motivation for using a Transformer architecture. 3. The experimental evaluation of BGRPO is somewhat incomplete. The method is introduced as an improvement over GRPO and PPO, yet no quantitative baseline results are provided for these methods. This makes the advantage of BGRPO unclear. 4. Although the paper states that
1. The problem formulation is definitely insightful. I have seen various different avatars of polynomial handling. But, this also has practical implications. 2. Experiments are comprehensive, with in-depth analysis of both vanilla models and improved with BGRPO. 3. Rank-aware BGRPO seems to be an innovative contribution (modulo the fact that improvements seem to decrease with dimension size). 4. Various insights are produced which are useful.
1. The repercussions of using beam search instead of sampling from the distribution is not discussed. Maybe this is why the effect decreases with more model capacity. 2. Some ablations across varying representation and effect numeracy is missed. 3. While the Lample-Charton era work has discussed polynomial handling ability of vanilla transformers, it would have been great to discuss how pretrained Language models (pre-LLM is fine as well) can handle such tasks.
1. The paper tackles an interesting application of deep learning, where it appears (to my knowledge) to be the first work exploring the potential of transformers on functional decomposition. 2. The ablation studies are fairly extensive. 3. The method outperforms Mathematica on the task of simplification in 2 out of 5 attempted complexity configurations.
1. All evaluations were performed on synthetic data, lacking evaluations on real-world instances. 2. The lack of baseline evaluations makes it hard to contextualize the overall performance. While many existing algorithms tackle different constraints of the problem, the authors could still evaluate their method on these special cases. More importantly, Faugère & Perret (2009) [1] present a heuristic algorithm that handles the single-polynomial multi-multivariate decomposition case (when u=1) on
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPolynomial and algebraic computation · Topic Modeling · Model Reduction and Neural Networks
