Sequential Policy Gradient for Adaptive Hyperparameter Optimization
Zheng Li, Jerry Cheng, Huanying Helen Gu

TL;DR
The paper introduces Sequential Policy Gradient (SPG), a lightweight online hyperparameter optimization method inspired by DeepSeek-V3, which generates trajectories efficiently and improves model performance across diverse datasets with low computational costs.
Contribution
SPG extends policy gradient methods with temporary modules for single-pass trajectory generation, enabling efficient hyperparameter optimization.
Findings
SPG improves model performance by 0.2% to 7% across datasets.
SPG reduces computational costs compared to traditional methods.
SPG is effective on vision, NLP, and audio tasks.
Abstract
Reinforcement learning is essential for neural architecture search and hyperparameter optimization, but the conventional approaches impede widespread use due to prohibitive time and computational costs. Inspired by DeepSeek-V3 multi-token prediction architecture, we propose Sequential Policy Gradient modeling (SPG), a novel trajectory generation paradigm for lightweight online hyperparameter optimization. In contrast to conventional policy gradient methods, SPG extends the base model with temporary modules, enabling it to generate state-action (padded) trajectories in a single forward pass. Our experiments demonstrate that models gain performance when retrained with SPG on their original datasets and also outperform standard transfer fine-tuning. We evaluate on five datasets spanning computer vision (ImageNet, COCO), natural language processing (GLUE, SQuAD), and audio (SUPERB) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Machine Learning and Data Classification · Metaheuristic Optimization Algorithms Research
