TL;DR
This paper introduces Zwei, a self-play reinforcement learning algorithm that directly optimizes video transmission policies to meet specific requirements, outperforming existing methods by accurately aligning with user demands.
Contribution
The paper presents Zwei, a novel self-play reinforcement learning approach that directly incorporates actual requirements into policy optimization for video transmission.
Findings
Zwei outperforms state-of-the-art methods across multiple scenarios.
It accurately aligns transmission policies with specified requirements.
The approach effectively adapts to different video transmission scenarios.
Abstract
Video transmission services adopt adaptive algorithms to ensure users' demands. Existing techniques are often optimized and evaluated by a function that linearly combines several weighted metrics. Nevertheless, we observe that the given function fails to describe the requirement accurately. Thus, such proposed methods might eventually violate the original needs. To eliminate this concern, we propose \emph{Zwei}, a self-play reinforcement learning algorithm for video transmission tasks. Zwei aims to update the policy by straightforwardly utilizing the actual requirement. Technically, Zwei samples a number of trajectories from the same starting point and instantly estimates the win rate w.r.t the competition outcome. Here the competition result represents which trajectory is closer to the assigned requirement. Subsequently, Zwei optimizes the strategy by maximizing the win rate. To build…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
