Dynamically meeting performance objectives for multiple services on a   service mesh

Forough Shahab Samani; Rolf Stadler

arXiv:2210.04002·cs.LG·October 11, 2022

Dynamically meeting performance objectives for multiple services on a service mesh

Forough Shahab Samani, Rolf Stadler

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning framework that dynamically manages multiple service objectives on a service mesh, using simulation-based policy learning to optimize request routing and blocking under varying loads.

Contribution

It presents a novel approach combining system modeling and simulation to efficiently learn near-optimal control policies for service management objectives.

Findings

01

RL-based control policies effectively meet delay and throughput goals.

02

Simulation accelerates policy learning by orders of magnitude.

03

Policies generalize well to unseen load patterns.

Abstract

We present a framework that lets a service provider achieve end-to-end management objectives under varying load. Dynamic control actions are performed by a reinforcement learning (RL) agent. Our work includes experimentation and evaluation on a laboratory testbed where we have implemented basic information services on a service mesh supported by the Istio and Kubernetes platforms. We investigate different management objectives that include end-to-end delay bounds on service requests, throughput objectives, and service differentiation. These objectives are mapped onto reward functions that an RL agent learns to optimize, by executing control actions, namely, request routing and request blocking. We compute the control policies not on the testbed, but in a simulator, which speeds up the learning process by orders of magnitude. In our approach, the system model is learned on the testbed;…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPeer-to-Peer Network Technologies · Network Traffic and Congestion Control

Methodstravel james