Research on Low-Latency Inference and Training Efficiency Optimization for Graph Neural Network and Large Language Model-Based Recommendation Systems

Yushang Zhao; Haotian Lyu; Yike Peng; Aijia Sun; Feng Jiang; Xinyue Han

arXiv:2507.01035·cs.LG·July 3, 2025

Research on Low-Latency Inference and Training Efficiency Optimization for Graph Neural Network and Large Language Model-Based Recommendation Systems

Yushang Zhao, Haotian Lyu, Yike Peng, Aijia Sun, Feng Jiang, Xinyue Han

PDF

Open Access

TL;DR

This paper presents optimization strategies combining hardware and software techniques to significantly improve the inference speed and training efficiency of hybrid GNN-LLM recommendation systems, enabling real-time personalized recommendations.

Contribution

It introduces a comprehensive hybrid architecture-optimization framework with hardware acceleration, demonstrating substantial accuracy and efficiency improvements over traditional methods.

Findings

01

Optimal configuration achieves 13.6% higher accuracy at 40-60ms latency

02

LoRA reduces training time by 66% compared to baseline

03

Hardware-software co-design outperforms independent GNN or LLM implementations

Abstract

The incessant advent of online services demands high speed and efficient recommender systems (ReS) that can maintain real-time performance along with processing very complex user-item interactions. The present study, therefore, considers computational bottlenecks involved in hybrid Graph Neural Network (GNN) and Large Language Model (LLM)-based ReS with the aim optimizing their inference latency and training efficiency. An extensive methodology was used: hybrid GNN-LLM integrated architecture-optimization strategies(quantization, LoRA, distillation)-hardware acceleration (FPGA, DeepSpeed)-all under R 4.4.2. Experimental improvements were significant, with the optimal Hybrid + FPGA + DeepSpeed configuration reaching 13.6% more accuracy (NDCG@10: 0.75) at 40-60ms of latency, while LoRA brought down training time by 66% (3.8 hours) in comparison to the non-optimized baseline. Irrespective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Advanced Neural Network Applications · Big Data and Digital Economy