GPU Memory Requirement Prediction for Deep Learning Task Based on Bidirectional Gated Recurrent Unit Optimization Transformer

Chao Wang; Zhizhao Wen; Ruoxin Zhang; Puyang Xu; Yifan Jiang

arXiv:2510.20985·cs.LG·October 27, 2025

GPU Memory Requirement Prediction for Deep Learning Task Based on Bidirectional Gated Recurrent Unit Optimization Transformer

Chao Wang, Zhizhao Wen, Ruoxin Zhang, Puyang Xu, Yifan Jiang

PDF

TL;DR

This paper introduces a novel BiGRU-optimized Transformer model that significantly improves GPU memory demand prediction accuracy for deep learning tasks, outperforming traditional machine learning benchmarks.

Contribution

The study proposes an innovative BiGRU-Transformer model tailored for GPU memory prediction, demonstrating superior accuracy and stability over existing machine learning methods.

Findings

01

The BiGRU Transformer model achieves the lowest MSE and RMSE among compared models.

02

The model's predictions closely match actual GPU memory usage, with minimal deviation.

03

Performance improvements support better resource scheduling in deep learning environments.

Abstract

In response to the increasingly critical demand for accurate prediction of GPU memory resources in deep learning tasks, this paper deeply analyzes the current research status and innovatively proposes a deep learning model that integrates bidirectional gated recurrent units (BiGRU) to optimize the Transformer architecture, aiming to improve the accuracy of memory demand prediction. To verify the effectiveness of the model, a carefully designed comparative experiment was conducted, selecting four representative basic machine learning models: decision tree, random forest, Adaboost, and XGBoost as benchmarks. The detailed experimental results show that the BiGRU Transformer optimization model proposed in this paper exhibits significant advantages in key evaluation indicators: in terms of mean square error (MSE) and root mean square error (RMSE), the model achieves the lowest value among…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.