Sequential Policy Gradient for Adaptive Hyperparameter Optimization

Zheng Li; Jerry Cheng; Huanying Helen Gu

arXiv:2506.15051·cs.LG·June 19, 2025

Sequential Policy Gradient for Adaptive Hyperparameter Optimization

Zheng Li, Jerry Cheng, Huanying Helen Gu

PDF

Open Access

TL;DR

The paper introduces Sequential Policy Gradient (SPG), a lightweight online hyperparameter optimization method inspired by DeepSeek-V3, which generates trajectories efficiently and improves model performance across diverse datasets with low computational costs.

Contribution

SPG extends policy gradient methods with temporary modules for single-pass trajectory generation, enabling efficient hyperparameter optimization.

Findings

01

SPG improves model performance by 0.2% to 7% across datasets.

02

SPG reduces computational costs compared to traditional methods.

03

SPG is effective on vision, NLP, and audio tasks.

Abstract

Reinforcement learning is essential for neural architecture search and hyperparameter optimization, but the conventional approaches impede widespread use due to prohibitive time and computational costs. Inspired by DeepSeek-V3 multi-token prediction architecture, we propose Sequential Policy Gradient modeling (SPG), a novel trajectory generation paradigm for lightweight online hyperparameter optimization. In contrast to conventional policy gradient methods, SPG extends the base model with temporary modules, enabling it to generate state-action (padded) trajectories in a single forward pass. Our experiments demonstrate that models gain performance when retrained with SPG on their original datasets and also outperform standard transfer fine-tuning. We evaluate on five datasets spanning computer vision (ImageNet, COCO), natural language processing (GLUE, SQuAD), and audio (SUPERB) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Machine Learning and Data Classification · Metaheuristic Optimization Algorithms Research