Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning

Tianchi Huang; Chao Zhou; Rui-Xiao Zhang; Chenglei Wu; Xin Yao; Lifeng; Sun

arXiv:1908.02270·cs.MM·December 24, 2019

Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning

Tianchi Huang, Chao Zhou, Rui-Xiao Zhang, Chenglei Wu, Xin Yao, Lifeng, Sun

PDF

1 Repo

TL;DR

Comyco is a novel quality-aware adaptive video streaming method that leverages imitation learning to improve sample efficiency and video quality, outperforming existing approaches in both efficiency and quality metrics.

Contribution

It introduces a new imitation learning-based ABR approach that incorporates video quality awareness and expert trajectory imitation to enhance learning efficiency and streaming quality.

Findings

01

Significantly reduces sample requirements by 1700x.

02

Achieves 16x faster training times compared to prior methods.

03

Improves average QoE by up to 16.79% over existing approaches.

Abstract

Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding strategies without any presumptions, has become one of the research hotspots for adaptive streaming. However, it typically suffers from several issues, i.e., low sample efficiency and lack of awareness of the video quality information. In this paper, we propose Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling the above issues. Comyco trains the policy via imitating expert trajectories given by the instant solver, which can not only avoid redundant exploration but also make better use of the collected samples. Meanwhile, Comyco attempts to pick the chunk with higher perceptual video qualities rather than video bitrates. To achieve this, we construct Comyco's neural network architecture, video datasets and QoE metrics with video quality features. Using…

Tables3

Table 1. Table 1. Perfomance Comparison of QoE Models on Waterloo Streaming SQoE-III (Duanmu et al . , 2018 )

QoE model	Type	VQA	SRCC
Pensieve’s (Mao et al., 2017)	linear	-	0.6256
MPC’s (Yin et al., 2015)	linear	-	0.7143
Bentaleb’s (Bentaleb et al., 2016)	linear	SSIMplus (Rehman et al., 2015)	0.6322
Duanmu’s (Duanmu et al., 2018)	linear	-	0.7743
Comyco’s	linear	VMAF (Rassool, 2017)	0.7870

Table 2. Table 2. Comyco with different N 𝑁 N and replay strategies.

$α = 0.001$ /N	5	6	7	8	9
Replay Off	0.883	0.893	0.917	0.932	0.942
Replay On	0.911	0.921	0.937	0.946	0.960
TimeSpan(Opt. Off)(ms)	1.56	8.74	58.44	389.68	2604.46

Table 3. Table 3. Comyco with different α 𝛼 \alpha .

$α$	0.1	0.01	0.001	0.0001	0
k=4	0.883	0.895	0.904	0.881	0.867

Equations14

\overset{π}{^} = π \in T a r g min E_{s \sim d_{π}} [l_{t} (π_{t}, π_{t}^{*})]

\overset{π}{^} = π \in T a r g min E_{s \sim d_{π}} [l_{t} (π_{t}, π_{t}^{*})]

R_{1}, \dots, R_{k}, T_{s} max Q o E^{N}, s . t .

R_{1}, \dots, R_{k}, T_{s} max Q o E^{N}, s . t .

⎩ ⎨ ⎧ t_{k + 1} = t_{k} + \frac{d _{k} ( R _{k} )}{C _{k}} + δ t_{k}, C_{k} = \frac{1}{t _{k + 1} - t _{k} - δ t _{k}} \int_{t_{k}}^{t_{k + 1} - δ t_{k}} C_{t} d t, B_{k + 1} = [(B_{k} - \frac{d _{k} ( R _{k} )}{C _{k}})_{+} + L - δ t_{k}]_{+}, B_{1} = T_{s}, B_{k} \in [0, B_{ma x}], R_{k} \in R, \forall k = 1, \dots, N .

π max V_{π} (s)

π max V_{π} (s)

= π max a max q_{π} (s, a)

= a max q_{*} (s, a)

L_{co m y co} = - \sum \hat{A} lo g π (s, a; θ) - α H (π (s; θ)) .

L_{co m y co} = - \sum \hat{A} lo g π (s, a; θ) - α H (π (s; θ)) .

QoE_{v}

QoE_{v}

+ γ n = 1 \sum^{N - 1} [q (R_{n + 1}) - q (R_{n})]_{+} - δ n = 1 \sum^{N - 1} [q (R_{n + 1}) - q (R_{n})]_{-},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-media/Comyco
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

$\mathbcal{Comyco}$ : Quality-Aware Adaptive Video Streaming via

Imitation Learning

Tianchi Huang1,3, Chao Zhou2∗, Rui-Xiao Zhang1,3, Chenglei Wu1,3, Xin Yao1,3, Lifeng Sun1,3∗

1Dept. of Computer Science and Technology, Tsinghua University

2Beijing Kuaishou Technology Co., Ltd., China

3BNRist, Dept. of Computer Science and Technology, Tsinghua University

{htc17@mails.,sunlf@}tsinghua.edu.cn, [email protected]

(2019)

Abstract.

Learning-based Adaptive Bit Rate (ABR) method, aiming to learn outstanding strategies without any presumptions, has become one of the research hotspots for adaptive streaming. However, it typically suffers from several issues, i.e., low sample efficiency and lack of awareness of the video quality information. In this paper, we propose Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling the above issues. Comyco trains the policy via imitating expert trajectories given by the instant solver, which can not only avoid redundant exploration but also make better use of the collected samples. Meanwhile, Comyco attempts to pick the chunk with higher perceptual video qualities rather than video bitrates. To achieve this, we construct Comyco’s neural network architecture, video datasets and QoE metrics with video quality features. Using trace-driven and real world experiments, we demonstrate significant improvements of Comyco’s sample efficiency in comparison to prior work, with 1700x improvements in terms of the number of samples required and 16x improvements on training time required. Moreover, results illustrate that Comyco outperforms previously proposed methods, with the improvements on average QoE of 7.5% - 16.79%. Especially, Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average video quality under the same rebuffering time.

Imitation Learning, Quality-aware, Adaptive Video Streaming

††journalyear: 2019††conference: Proceedings of the 27th ACM International Conference on Multimedia; October 21–25, 2019; Nice, France††booktitle: Proceedings of the 27th ACM International Conference on Multimedia (MM ’19), October 21–25, 2019, Nice, France††price: 15.00††doi: 10.1145/3343031.3351014††isbn: 978-1-4503-6889-6/19/10††copyright: acmcopyright††article: 4††price: 15.00††ccs: Information systems Multimedia streaming††ccs: Computing methodologies Neural networks

1. Introduction

Recent years have seen a tremendous increase in the requirements of watching online videos (Cisco, 2017). Adaptive bitrate (ABR) streaming, the method that dynamically switches download chunk bitrates for restraining rebuffering event as well as obtaining higher video qualities, has become the popular scheme to deliver videos with high quality of experience (QoE) to the users (Bentaleb et al., 2018). Recent model-based ABR approaches (§7) pick the next chunk’s video bitrate via only current network status (Jiang et al., 2014), or buffer occupancy (Spiteri et al., 2016), or joint consideration of both two factors(Yin et al., 2015). However, such heuristic methods are usually set up with presumptions, that fail to work well under unexpected network conditions (Mao et al., 2017). Thus, learning-based ABR methods adopt reinforcement learning (RL) method to learn the strategies without any presumptions, which outperform traditional model-based approaches.

Nevertheless, learning-based ABR methods suffer from two key issues. While recent work (Mao et al., 2017; Gadaleta et al., 2017) often adopts RL methods to train the neural network, such methods lack the efficiency of both collected and exploited expert samples, which leads to the inefficient training (Mendonca et al., 2019). Besides, the majority of existing ABR approaches (Yin et al., 2015; Mao et al., 2017; Akhtar and et al., 2018) neglect the video quality information, while perceptual video quality is a non-trivial feature for evaluating QoE (§5.1,(Huang et al., 2018b)). Thus, despite their abilities to achieve higher QoE objectives, such schemes may generate a strategy that diverges from the actual demand (§2.2).

In this paper, we propose Comyco, a novel video quality-aware learning-based ABR system, aiming to remarkably improve the overall performance of ABR algorithms via tackling the above challenges. Unlike previous RL-based schemes (Mao et al., 2017), Comyco leverages imitation learning (Osa et al., 2018) to train the neural network (NN). That is because the near-optimal policy can be precisely and instantly estimated via the current state in the ABR scenario and the collected expert policies can enable the NN for fast learning. Following this thought (§3.1), the agent is allowed to explore the environment and learn the policy via the expert policies given by the solver (§4.5). Specifically, we propose instant solver (§4.2) to estimate the expert action with a faithful virtual player (§6.1). Furthermore, we utilize experience replay buffer (§4.4) to store expert policies and train the NN via the specific loss function $L_{comyco}$ (§4.3).

Besides, Comyco aims to select bitrate with high perceptual video quality rather than high video bitrate. To achieve this, we first integrate the information of video contents, network status, and video playback states into the Comyco’s NN for bitrate selection (§4.1). Next, we consider using VMAF (Rassool, 2017), an objective full-reference perceptual video quality metric, to measure the video quality. Concurrently, we also propose a linear combination of video quality-based QoE metric that achieves the state-of-art performance on Waterloo Streaming SQoE-III (Duanmu et al., 2018) dataset (§5.1). Finally, we collect a DASH-video dataset with various types of videos, including movies, sports, TV-shows, games, news, and music videos (MV) (§5.2).

Using trace-driven emulation (§6.1), we find that Comyco significantly accelerates the training process, with 1700x improvements in terms of number of samples required compared to recent work (§6.2). Comparing Comyco with existing schemes under various network conditions (§6.1) and videos (§5.2), we show that Comyco outperforms previously proposed methods, with the improvements on average QoE of 7.5% - 16.79%. In particular, Comyco performs better than state-of-the-art learning-based approach Pensieve, with the improvements on the average video quality of 7.37% under the same rebuffering time. Further, we present results which highlight Comyco’s performance with different hyperparameters and settings (§6.4). Finally, we validate Comyco in real world network scenarios (§6.5). Extensive results indicate the superiority of Comyco over existing state-of-the-art approaches.

In general, we summarize the contributions as follows:

$\triangleright$

We propose Comyco, a video quality-aware learning-based ABR system, that significantly ameliorates the weakness of the learning-based ABR schemes from two perspectives. 2. $\triangleright$

To the best of our knowledge, we are the first to leverage imitation learning to accelerate the training process for ABR tasks. Results indicate that utilizing imitation learning can not only achieve fast convergence rates but also improve performance. 3. $\triangleright$

Unlike prior work, Comyco picks the video chunk with high perceptual video quality instead of high video bitrate. Results also demonstrate the superiority of the proposed algorithm.

2. Background and Challenges

2.1. ABR Overview

Due to the rapid development of network services, watching video online has become a common trend. Today, the predominant form for video delivery is adaptive video streaming, such as HLS (HTTP Live Streaming) (HLS, 2019) and DASH (das, 2019), which is a method that dynamically selects video bitrates according to network conditions and clients’ buffer occupancy. Traditional video streaming framework consists of a video player client with a constrained buffer length and an HTTP-Server or Content Delivery Network (CDN). The video player client decodes and renders video frames from the playback buffer. Once the streaming service starts, the client fetches the video chunk from the HTTP Server or CDN in order by an ABR algorithm. Meanwhile, the algorithm, deployed on the client side, determines the next chunk $N$ and next chunk video quality $Q_{N}$ via throughput estimation and current buffer utilization. The goal of the ABR algorithm is to provide the video chunk with high qualities and avoid stalling or rebuffering (Bentaleb et al., 2018).

2.2. Challenges for learning-based ABRs

Most traditional ABR algorithms (Jiang et al., 2014; Yin et al., 2015; Spiteri et al., 2016) leverage time-series prediction or automation control method to make decisions for the next chunk. Nevertheless, such methods are built in pre-assumptions that it is hard to keep its performance in all considered network scenarios (Mao et al., 2017). To this end, learning-based ABR algorithms (Mao et al., 2017; Gadaleta et al., 2017; Huang et al., 2018a) are proposed to solve the problem from another perspective: it adopts deep reinforcement learning (DRL) to train a neural network (NN) from scratch towards the better QoE objective. Despite the outstanding results that recent work has obtained, learning-based ABR methods suffer from several key issues:

The weaknesses of RL-based ABR algorithms. Recent learning-based ABR schemes often adopt RL methods to maximize the average QoE objectives. During the training, the agent rollouts a trajectory and updates the NN with policy gradients. However, the effect of calculated gradients heavily depends on the amount and quality of collected experiences. In most cases, the collected samples seldom stand for the optimal policy of the corresponding states, which leads to a long time to converge to the sub-optimal policy (Osa et al., 2018; Mao et al., 2019). Thus, we are facing the first challenge: Considering the characteristic of ABR tasks, can we precisely estimate the optimal direction of gradients to guide the model for better updating?

The unique video quality. What’s more, previous learning-based ABR schemes (Yin et al., 2015; Mao et al., 2017) are evaluated by typical QoE objectives that use the combination of video bitrates, rebuffering times and video smoothness. However, such QoE metrics are short-handed because these forms of parameters neglect the quality of video presentations (Wang, 2017). Meanwhile, recent work (Qin et al., 2018; Duanmu et al., 2017) has found that perceptual video quality features play a vital part in evaluating the performance of VBR-encoded ABR streaming services. To prove this, we plot the trajectory generated by the quality-aware ABR and bitrate-aware algorithm on Figure 1. As shown, the bitrate-aware algorithm selects the video chunk with higher bitrate but neglects the corresponding video quality, resulting in a large fluctuation in the perceptual video qualities. What’s more, bitrate-aware algorithm often wastes the buffer on achieving a slight increase in video quality, which may cause unnecessary stalling event. On the contrast, the quality-aware algorithm picks the chunk with high and stable perceptual video quality and preserves the buffer occupancy within an allowable range. To this end, one of the better solutions is to add video bitrates as another metric to describe the perceptual video quality. We, therefore, encounter the second challenge of our work: How to construct a video quality-aware ABR system?

3. Methods

Motivated by the key challenges (§2.2), we propose Comyco, a video quality-aware learning-based ABR scheme. In this section, we introduce two main ideas of Comyco: training NN via imitation learning (§3.1) and a complete video quality-based ABR system (§3.2).

3.1. Training ABRs via Imitation Learning

Recall that the key principle of RL-based method is to maximize reward of each action taken by the agent in given states per step, since the agent doesn’t really know the optimal strategy (Sutton and Barto, 2018). However, recent work (Yin et al., 2015; Mao et al., 2017; Spiteri et al., 2018; Akhtar and et al., 2018; Pereira et al., 2018; Huang et al., 2018a) has demonstrated that the ABR process can be precisely emulated by an offline virtual player (§6.1) with complete future network information. What’s more, by taking several steps ahead, we can further accurately estimate the near-optimal expert policy of any ABR state within an acceptable time (§4.2). To this end, the intuitive idea is to leverage supervised learning methods to minimize the loss between the predicted and the expert policy. Nevertheless, it’s impractical because the off-policy method (Sutton and Barto, 2018) suffers from compounding error when the algorithm executes its policy, leading it to drift to new and unexpected states (Laskey et al., 2017). For example, as shown in Figure 2[a], in the beginning, supervised learning-based ABR algorithm fetches the bitrate that is consistent with the expert policy, but when it selects a bitrate with a minor error (after the black line), the state may be transitted to the situation not included in the dataset, so the algorithm would select another wrong bitrate. Such compounding errors eventually lead to a continuous rebuffering event. As a result, supervised-learning methods cannot learn to recover from failures.

In this paper, we aim to leverage imitation learning, a method that closely related to RL and supervised learning, to learn the strategy from the expert policy samples. Imitation learning method reproduces desired behavior according to expert demonstrations (Osa et al., 2018). The key idea of imitation learning is to allow the NN to explore environments and collect samples (just like RL) and learn the policy based on the expert policy (just as supervised learning). In detail, at step $t$ , the algorithm infers a policy $\pi_{t}$ at ABR state $S_{t}$ . It then computes a loss $l_{t}(\pi_{t},\pi^{*}_{t})$ w.r.t the expert policy $\pi^{*}_{t}$ . After observing the next state $S_{t+1}$ , the algorithm further provides a different policy $\pi_{t+1}$ for the next step ${t+1}$ that will incur another loss $l_{t}(\pi_{t+1},\pi^{*}_{t+1})$ . Thus, for each $\pi_{t}$ in the class of policies $T\in\{\pi_{0},\dots,\pi_{t}\}$ , we can find the policy $\hat{\pi}$ through any supervised learning algorithms (Eq. 1).

[TABLE]

Figure 2[b] elaborates the principle of imitation learning-based ABR schemes: the algorithm attempts to explore the strategy in a range near the expert trajectory to avoid compounding errors.

3.2. Video Quality-aware ABR System Setup

Our next challenge is to set up a video quality-aware ABR system. The work is generally composed of three tasks: 1) We construct Comyco’s NN architecture with jointly considering several underlying metrics, i.e, past network features and video content features as well as video playback features (§4.1). 2) We propose a quality-based QoE metric (§5.1). 3) We collect a video quality DASH dataset which includes various types of videos (§5.2).

4. System Overview

In this section, we describe the proposed system in detail. Comyco’s basic system work-flow is illustrated in Figure 3. The system is mainly composed of a NN, an ABR virtual player, an instant solver, and an experience replay buffer. We start by introducing the Comyco’s modules. Then we explain the basic training methodology. Finally, we further illustrate Comyco with a multi-agent framework.

4.1. NN Architecture Overview

Motivated by the recent success of on-policy RL-based methods, Comyco’s learning agent is allowed to explore the environment via traditional rollout methods. For each epoch $t$ , the agent aims to select next bitrate via a neural network (NN). We now explain the details of the agent’s NN including its inputs, outputs, network architecture, and implementation.

Inputs. We categorize the NN into three parts, network features, video content features and video playback features ( $S_{k}=\{C_{k},M_{k},F_{k}\}$ ). Details are described as follows.

$\triangleright$

Past Network features. The agent takes past $t$ chunks’ network status vector $C_{k}=\{c_{k-t-1},\dots,c_{k}\}$ into NN, where $c_{i}$ represents the throughput measured for video chunk $i$ . Specifically, $c_{i}$ is computed by $c_{i}=n_{r,i}/{d_{i}}$ , in which $n_{r,i}$ is the downloaded video size of chunk $i$ with selected bitrates $r$ , and $d_{i}$ means download time for video chunk $n_{r,i}$ .

$\triangleright$

Video content features. Besides that, we also consider adding video content features into NN’s inputs for improving its abilities on detecting the diversity of video contents. In details, the learning agent leverages $M_{k}=\{N_{k+1},V_{k+1}\}$ to represent video content features. Here $N_{k+1}$ is a vector that reflects the video size for each bitrate of the next chunk $k+1$ , and $V_{k+1}$ is a vector which stands for the perceptual video quality metrics for each bitrate of the next chunk.

$\triangleright$

Video playback features. The last essential feature for describing the ABR’s state is the current video playback status. The status is represented as $F_{k}=\{v_{k-1},B_{k},D_{k},m_{k}\}$ , where $v_{k-1}$ is the perceptual video quality metric for the past video chunk selected, $B_{k},D_{k}$ are vectors which stand for past t chunks’ buffer occupancy and download time, and $m_{k}$ means the normalized video chunk remaining.

Outputs. Same as previous work, we consider using discrete action space to describe the output. Note that the output is an n-dim vector indicating the probability of the bitrate being selected under the current ABR state $S_{k}$ .

Implementation. As shown in Figure 4, for each input type, we use a proper and specific method to extract the underlying features. In details, we first leverage a single 1D-CNN layer with kernel=4, channels=128, stride=1 to extract network features to a 128-dim layer. We then use two 1D-CNN layers with kernel=1x4, channels=128 to fetch the hidden features from the future chunk’s video content matrix. Meanwhile, we utilize 1D-CNN or fully connected layer to extract the useful characteristics from each metric upon the video playback inputs. The selected features are passed into a GRU layer and outputs as a 128-dims vector. Finally, the output of the NN is a 6-dims vector, which represents the probabilities for each bitrate selected. We utilize RelU as the active function for each feature extraction layer and leverage softmax for the last layer.

4.2. Instant Solver

Once the sampling module rolls out an action $a_{t}$ , we aim to design an algorithm to fetch all the optimal actions $\hat{a_{t}}$ with respect to current state $s_{t}$ . Followed by these thoughts, we further propose Instant Solver. The key idea is to choose future chunk $k$ ’s bitrate $R_{k}$ by taking $N$ steps ahead via an offline virtual player, and solves a specific QoE maximization problem with future network throughput measured $C_{t}$ , in which the future real throughput can be successfully collected under both offline environments and real-world network scenarios. Inspired by recent model-based ABR work (Yin et al., 2015), we formulate the problem as demonstrated in Eq. 4.2, denoted as $QoE^{N}_{max}K$ . In detail, the virtual player consists of a virtual time, a real-world network trace and a video description. At virtual time $t_{k}$ , we first calculate download time for chunk $k$ via $d_{k}(R_{k})/C_{k}$ , where $d_{k}$ is the video chunk size for bitrate $R_{k}$ , and $C_{k}$ is average throughput measured. We then update $B_{k+1}$ buffer occupancy for chunk $k+1$ , in which $\delta t_{k}$ reflects the waiting time such as Round-Trip-Time (RTT) and video render time, and $B_{max}$ is the max buffer size. Finally, we refresh the virtual time $t_{k+1}$ for the next computation. Note that the problem can be solved with any optimization algorithms, such as memoization, dynamic programming as well as Hindsight (Huang et al., 2019). Ideally, there exists a trade-off between the computation overhead and the performance. We list the performance comparison of instant solver with different $N$ in §6.4. In this work, we set $N=8$ .

[TABLE]

4.3. Choice of Loss Functions for Comyco

In this section, we start with designing the loss function from the fundamental RL training methodologies. The goal of the RL-based method is to maximize the Bellman Equation, which is equivalent to maximize the value function $q_{\pi}(s,a)$ (Sutton and Barto, 2018). The equation is listed in Eq. 3, where $q_{*}(s,a)$ stands for the maximum action value function on all policies, $V_{\pi}(s)$ is the value function, $\pi(s,a;\theta)$ is the rollout policy. Thus, given an expert action $q_{\pi}(s,\hat{a})=q_{*}(s,a)$ , we can update the model via minimizing the gap between the true action probability $\hat{A}$ and $\pi$ , where $A$ is an one hot encoding in terms of $\hat{a}$ . In this paper, we use cross entropy error as the loss function. Recall that the function can be represented as any traditional behavioral cloning loss methods (Osa et al., 2018), such as Quadratic, LI-loss and Hinge loss function. In addition, we find that the other goal of the loss function is to maximize the probabilities of the selected action, while the goal significantly reduces the aggressiveness of exploration, and finally, resulting in obtaining the sub-optimal performance. Thus, motivated by the recent work on RL (Mnih et al., 2016), we add the entropy of the policy $\pi$ to the loss function. It can encourage the algorithm to increase the exploration rate in the early stage and discourage it in the later stage. The loss function for Comyco is described in Eq 4.

[TABLE]

Here $\pi(s,a;\theta)$ is the rollout policy selected by the NN, $\hat{A}$ is the real action probability vector generated by the expert actor $\hat{a}$ , $H(\pi(s;\theta)$ represents the entropy of the policy, $\alpha$ is a hyperparameter which controls the encouragement of exploration. In this paper, we set $alpha=0.001$ and discuss $L_{comyco}$ with different $\alpha$ in §6.4.

4.4. Training Comyco with Experience Replay

Recent off-policy RL-based methods (Mnih et al., 2013) leverage experience replay buffer to achieve better convergence behavior when training a function approximator. Inspired by the success of these approaches, we also create a sample buffer which can store the past expert strategies and allow the algorithm to randomly picks the sample from the buffer during the training process. We will discuss the effect of utilizing experience replay on Comyco in §6.4.

4.5. Methodology

We summarize the Comyco’s training methodology in Alg. 1.

4.6. Parallel Training

It’s notable that the training process can be designed asynchronously, which is quite suitable for multi-agent parallel training framework. Inspired by the multi-agent training method (Mnih et al., 2016; Huang et al., 2018b), we modify Comyco’s framework from single-agent training to multi-agent training. As illustrated in Figure 5, Comyco’s multi-agent training consists of three parts, a central agent with a NN, an experience replay buffer, and a group of agents with a virtual player and an instant solver. For any ABR state $s$ , the agents use virtual player to emulate the ABR process w.r.t current states and actions given by the NN which placed on the central agent, and collect the expert action $\hat{a}$ through the instant solver; they then submit the information containing $\{s,\hat{a}\}$ to the experience replay buffer. The central agent trains the NN by picking the sample batch from the buffer. Note that this can happen asynchronously among all agents. By default, Comyco uses 12 agents, which is the same number of CPU cores of our PC, to accelerate the training process.

4.7. Implementation

We now explain how to implement Comyco. We use TensorFlow (Abadi et al., 2016) to implement the training workflow and utilizing TFlearn (Tang, 2016) to construct the NN architecture. Besides, we use C++ to implement instant solver and the virtual player. Then we leverage Swig (Beazley et al., 1996) to compile them as a python class. Next, we will show more details: Comyco takes the past sequence length $k=8$ (as suggested by (Mao et al., 2017)) and future $7$ video chunk features (as suggested by (Yin et al., 2015)) into the NN. We set learning rate $\alpha=10^{-4}$ and use Adam optimizer (Kingma and Ba, 2014) to optimize the model. For more details, please refer to our repository 111https://github.com/thu-media/Comyco.

5. QoE Metrics & Video Datasets

Upon constructing the Comyco’s NN architecture with considering video content features, we have yet discussed how to train the NN. Indeed, we lack a video quality-aware QoE model and an ABR video dataset with video quality assessment. In this section, we use VMAF to describe the perceptual video quality of our work. We then propose a video quality-aware QoE metric under the guidance of real-world ABR QoE dataset (Duanmu et al., 2018). Finally, we collect and publish a DASH video dataset with different VMAF assessments.

5.1. QoE Model Setup

Motivated by the linear-based QoE metric that widely used to evaluate several ABR schemes (Pereira et al., 2018; Yin et al., 2015; Akhtar and et al., 2018; Mao et al., 2017; Bentaleb et al., 2016; Qin et al., 2018), we concluded our QoE metric $\texttt{QoE}_{v}$ as:

[TABLE]

where N is the total number of chunks during the session, $R_{n}$ represents the each chunk’s video bitrate, $T_{n}$ reflects the rebuffering time for each chunk $n$ , $q(R_{n})$ is a function that maps the bitrate $R_{n}$ to the video quality perceived by the user, $\left[q(R_{n+1})-q(R_{n})\right]_{+}$ denotes positive video bitrate smoothness, meaning switch the video chunk from low bitrate to high bitrate and $\left[q(R_{n+1})-q(R_{n})\right]_{-}$ is negative smoothness. Note that $\alpha$ , $\beta$ , $\gamma$ , $\delta$ are the parameters to describe their aggressiveness.

Choice of $q(R_{n})$ .

To better understand the correlation between video presentation quality and QoE metric, we test the correlation between mean opinion score (MOS) and video quality assessment (VQA) metrics, including video bitrate, SSIM (Hore and Ziou, 2010) and Video Multimethod Assessment Fusion (VMAF) (Rassool, 2017), under the Waterloo Streaming QoE Database III (SQoE-III)222SQoE-III is the largest and most realistic dataset for dynamic adaptive streaming over HTTP (Duanmu et al., 2018), which consists of a total of 450 streaming videos created from diverse source content and diverse distortion patterns. (Duanmu et al., 2018), where SSIM is a image quality metric which used by D-DASH (Gadaleta et al., 2017) and VMAF is an objective full-reference video quality metric which is formulated by Netflix to estimate subjective video quality. Results are collected with Pearson correlation coefficient (Benesty et al., 2009) as suggested by (Abar et al., 2017). Experimental results (Fig. 6) show that VMAF achieves the highest correlation among all candidates, with the improvements in the coefficient of 16.39%-43.54%. Besides, VMAF are also a popular scheme with great potential on both academia and industry (Aaron et al., 2015). We, therefore, set $q(R_{n})=\texttt{VMAF}(R_{n})$ .

QoE Parameters Setup.

Recall that main goal of our paper is to propose a feasible ABR system instead of a convincing QoE metric. In this work, we attempt to leverage linear-regression methods to find the proper parameters. Specifically, we randomly divide the SQoE-III database into two parts, 80% of the database for training and 20% testing. We follow the idea by (Duanmu et al., 2018) and run the training process for 1,000 times to mitigate any bias caused by the division of data. As a result, we set $\alpha=0.8469$ , $\beta=28.7959$ , $\gamma=0.2979$ , $\delta=1.0610$ . We leverage spearman correlation coefficient (SRCC), as suggested by (Duanmu et al., 2018), to evaluate the performance of our QoE model with existing proposed models and the median correlation and its corresponding regression model are demonstrated in Table 1. As shown, $QoE_{v}$ model outperforms recent work. In conclusion, the proposed QoE model is well enough to evaluate ABR schemes.

5.2. Video Datasets

To better improve the Comyco’s generalization ability, we propose a video quality DASH dataset involves movies, sports, TV-shows, games, news and MVs. Specifially, we first collect video clips with highest resolution from Youtube, then leverage FFmpeg (FFmpeg, [n. d.]) to encode the video by H.264 codec and MP4Box (GPAC, [n. d.]) to dashify videos according to the encoding ladder of video sequences (Duanmu et al., 2018; das, 2019). Each chunk is encoded as 4 seconds. During the trans-coding process, for each video, we measure VMAF, VMAF-4K and VMAF-phone metric with the reference resolution of $1920\times 1080$ respectively. In general, the dataset contains 86 complete videos, with 394,551 video chunks and 1,578,204 video quality assessments.

6. Evaluation

6.1. Methodology

Virtual Player. We design a faithful ABR offline virtual player to train Comyco via network traces and video descriptions. The player is written in C++ and Python3.6 and is closely refers to several state-of-the-art open-sourced ABR simulators including Pensieve, Oboe and Sabre (Spiteri et al., 2018).

Testbed. Our work consists of two testbeds. Both server and client run on the 12-core, Intel i7 3.7 GHz CPUs with 32GB RAM running Windows 10. Comyco can be trained efficiently on both GPU and CPU. Detailing the testbed, that includes:

$\triangleright$

Trace-driven emulation. Following the instructions of recent work (Mao et al., 2017; Akhtar and et al., 2018), we utilize Mahimahi (Netravali et al., 2015) to emulate the network conditions between the client (ChromeV73) and ABR server (SimpleHTTPServer by Python2.7) via collected network traces.

$\triangleright$

Real world Deployment. Details are illustrated in §6.5.

Network Trace Datasets.

We collect about 3,000 network traces, totally 47 hours, from public datasets for training and testing:

$\triangleright$

Chunk-level network traces: including HSDPA (Riiser et al., 2013): a well-known 3G/HSDPA network trace dataset, we use a slide-window to upsampling the traces as mentioned by Pensieve (1000 traces, 1s granularity); FCC (Report, 2016): a broadband dataset (1000 traces, 1s granularity); Oboe (Usc-Nsl, 2018) (428 traces, 1-5s granularity): a trace dataset collected from wired, WiFi and cellular network connections (Only for validation.)

$\triangleright$

Synthetic network traces: uses a Markovian model where each state represented an average throughput in the aforementioned range(Mao et al., 2017). We create network traces in over 1000 traces with 1s granularity.

ABR Baselines.

In this paper, we select several representational ABR algorithms from various type of fundamental principles:

$\triangleright$

Rate-based Approach (RB) (Jiang et al., 2014): uses harmonic mean of past five throughput measured as future bandwidth.

$\triangleright$

BOLA (Spiteri et al., 2016): turns the ABR problem into a utility maximization problem and solve it by using the Lyapunov function. It’s a buffer-based approach. We use BOLA provided by the authors (Spiteri et al., 2018).

$\triangleright$

Robust MPC (Yin et al., 2015): inputs the buffer occupancy and throughput predictions and then maximizes the QoE by solving an optimization problem. We use C++ to implement RobustMPC and leverage $QoE_{v}$ (§5.1) to optimize the strategy.

$\triangleright$

Pensieve (Mao et al., 2017): the state-of-the-art ABR scheme which utilizes Deep Reinforcement Learning (DRL) to pick bitrate for next video chunks. We use the scheme implemented by the authors (Mao, 2017) but retrain the model for our work (§6.2).

6.2. Comyco vs. ABR schemes

In this part, we attempt to compare the performance of Comyco with the recent ABR schemes under several network traces via the trace-driven virtual player. The details of selected ABR baselines are described in §6.1. We use EnvivoDash3, a widely used (Mao et al., 2017; Yin et al., 2015; Pereira et al., 2018; Akhtar and et al., 2018) reference video clip (das, 2019) and $QoE_{v}$ to measure the ABR performance.

$\triangleright$ Pensieve Re-training. We retrain Pensieve via our datasets (§6.1), NN architectures (§4.1) and QoE metrics (§5.1). Followed by recent work (Akhtar and et al., 2018), our experiments use different entropy weights in the range of $5.0$ to $1.0$ and dynamically decrease the weight every $1000$ iterations. Training time takes about 8 hours and we show that Pensieve outperforms RobustMPC, with an overall average QoE improvement of 3.5% across all sessions. Note that same experiments can improve the $QoE_{lin}$ (Yin et al., 2015) by 10.5%. It indicates that $QoE_{v}$ cannot be easily improved because the metric reflects the real world MOS score.

Comparison of Learning-based ABR schemes. Figure 8 illustrates the average QoE of learning-based ABR schemes on HSDPA datasets. We validate the performance of two schemes respectively during the training. Results are shown with two perspectives including Epoch-Average QoE and Training time-Average QoE and we see about 1700x improvement in terms of the number of samples required and about 16x improvement in terms of training time required. As expected (§3.1), we observe that supervised learning-based method fails to find a strategy, which thereby leads to the poor performance.

Comyco vs. Existing ABRs. Figure 7 shows the comparison of QoE metrics for existing ABR schemes (§6.1). Comyco outperforms recent ABRs, with the improvements on average QoE of 7.5% - 17.99% across the HSDPA dataset and 4.85%-16.79% across the FCC dataset. Especially, Besides, we also show the CDF of the percentage of improvent in QoE for Comyco over existing schemes. Comyco surpasses state-of-the-art ABR approach Pensieve for 91% of the sessions across the HSDPA dataset and 78% of the sessions across the FCC dataset. What’s more, we also report the performance of underlying metrics including average video quality (VMAF), rebuffering time, positive and negative smoothness, as well as QoE. We find that Comyco is well behaved on the average quality metric, which improves 6.84%-15.64% compared with other ABRs. Moreover, Comyco is able to avoid rebuffering and bitrate changes, which performs as same as state-of-art schemes.

6.3. Comyco with Multiple Videos

To better understand how does Comyco perform on various videos, we randomly pick videos from different video types and utilize Oboe network traces to evaluate the $QoE_{v}$ performances of the proposed methods. Oboe network traces have diversity network conditions, which brings more challenges for us to improve the performance. Figure 9 illustrates the comparison of QoE metrics for state-of-the-art ABR schemes under various video types. We find that Comyco generalizes well under all considered video scenarios, with the improvements on average QoE of 2.7%-23.3% compared with model-based ABR schemes and 2.8%-13.85% compared with Pensieve. Specifically, Comyco can provide high quality ABR services under movies, news, and sports, which are all the scenarios with frequent scene switches. We also find that Comyco fails to demonstrate overwhelming performance in serving music videos. It’s really an interesting topic and we’ll discuss it in future work.

6.4. Ablation Study

In this section, we set up several experiments that aim to provide a thorough understanding of Comyco, including its hyperparameters and overhead. Note that, we have computed the offline-optimal results via dynamic programming and complete network status (Mao et al., 2017) before the experiment and treated it as a baseline.

Comparison of different future step N. We report normalized QoE and raw time span of Comyco with different N and replay experience strategy in Table 2. Results are collected under the Oboe dataset. As shown, we find that experience replay can help Comyco learn better. Despite the outstanding performance of Comyco with N=9, this scheme lacks the algorithmic efficiency and can hardly be deployed in practice. Thus, we choose k=8 for harmonizing the performance and the cost.

Comyco with different $\alpha$ . Further, we compare the normalized QoE of Comyco with different $\alpha$ under the Oboe dataset. As listed in Table 3, we confirm that $\alpha=0.001$ represents the best parameters for our work. Meanwhile, results also prove the effective of utilizing entropy loss (§4.3).

Comyco Overhead. We calculate (Molchanov et al., 2016) the number of floating-point operations (FLOPs) of Comyco and find that Comyco has the computation of 229 Kflops, which is only 0.15% of the light-weighted neural network ShuffleNet V2 (Ma et al., 2018) (146 Mflops). In short, we believe that Comyco can be successfully deployed on the PC and laptop, or even, on the mobile.

6.5. Comyco in the Real World

We establish a full-system implementation to evaluate Comyco in the wild. The system mainly consists of a video player, an ABR server and an HTTP content server. On the server-side, we deploy an HTTP video content Server. On the client-side, we modify Dash.js (das, 2019) to implement our video player client and we use Chrome to watch the video. Moreover, we implement Comyco as a service on the ABR server. We evaluate the performance of proposed schemes under various network conditions including 4G/LTE network, WiFi network and international link (from Singapore to Beijing). Figure 10 illustrates network status, where $\mu$ is the average throughput measured and $\sigma$ represents standard deviation from the average. For each round, we randomly picks a scheme from candidates and summarize the bitrate selected and rebuffering time for each chunk. Each experiment takes about 2 hours. Figure 10 shows the average QoE results for each scheme under different network conditions. It’s clear that Comyco also outperforms previous state-of-the-art ABR schemes and it improves the average QoE of 4.57%-9.93% compared with Pensieve and of 6.43%-9.46% compared with RobustMPC.

7. Related Work

ABR schemes. Client-based ABR algorithms (Bentaleb et al., 2018) are mainly organized into two types: model-based and learning-based.

Model-based. The development of ABR algorithms begins with the idea of predicting throughput. FESTIVE (Jiang et al., 2014) estimates future throughput via the harmonic mean of the throughput measured for the past chunk downloads. Meanwhile, many approaches are designed to select the appropriate high bitrate next video chunk and avoid rebuffering events based on playback buffer size observed. BBA (Huang et al., 2015) proposes a linear criterion threshold to control the available playback buffer size. Mixed approaches, e.g., MPC (Yin et al., 2015), select bitrate for the next chunk by adjusting its throughput discount factor based on past prediction errors and estimating its playback buffer size. What’s more, Akhtar et al. (Akhtar and et al., 2018) propose an auto-tuning method to improve the model-based ABR’s performance.

Learning-based: Several attempts have been made to optimize the ABR algorithm based on RL method due to the difficulty of tuning mixed approaches for handling different network conditions. Pensieve (Mao et al., 2017) is a system that uses DRL to select bitrate for future video chunks. D-DASH (Gadaleta et al., 2017) uses Deep Q-learning method to perform a comprehensive evaluation based on state-of-the-art algorithms. Tiyuntsong optimizes itself towards a rule or a specific reward via the competition with two agents under the same network condition (Huang et al., 2018a).

Imitation Learning meets Networking. Imitation learning (Hussein et al., 2017) has been widely used in the various fields including networking. Tang et al. (Tang et al., 2018) propose real-time deep learning based intelligent network traffic control method to represent the considered Wireless Mesh Network (WMN) backbone via imitation learning. Indigo (Yan et al., 2018) uses DAgger (Ross et al., 2011) to train a congestion-control NN scheme in the offline network emulator.

8. Conclusion

In this work, we propose Comyco, a learning-based ABR system which aim to thoroughly improve the performance of learning-based algorithm. To overcome the sample inefficiency problem, we leverage imitation learning method to guide the algorithm to explore and exploit the better policy rather than stochastic sampling. Moreover, we construct the video quality-based ABR system, including its NN architectures, datasets and QoE metrics. With trace-driven emulation and real-world deployment, we show that Comyco significantly improves the performance and effectively accelerates the training process.

Acknowledgement. We thank the anonymous reviewer for the valuable feedback. Special thanks to Huang’s wife Yuyan Chen, also namely Comyco, for her great support and, happy Chinese valentine’s day. This work was supported by the National Key R&D Program of China (No. 2018YFB1003703), NSFC under Grant 61521002, Beijing Key Lab of Networked Multimedia, and Kuaishou-Tsinghua Joint Project (No. 20192000456).

Bibliography52

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2das (2019) 2019. DASH Industry Forum — Catalyzing the adoption of MPEG-DASH. (2019). https://dashif.org/
3HLS (2019) 2019. HTTP Live Streaming. https://developer.apple.com/streaming/ . (2019).
4Aaron et al . (2015) Anne Aaron, Zhi Li, Megha Manohara, Joe Yuchieh Lin, Eddy Chi-Hao Wu, and C-C Jay Kuo. 2015. Challenges in cloud based ingest and encoding for high quality streaming media. In 2015 IEEE International Conference on Image Processing (ICIP) . IEEE, 1732–1736.
5Abadi et al . (2016) Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al . 2016. Tensor Flow: A System for Large-Scale Machine Learning.. In OSDI , Vol. 16. 265–283.
6Abar et al . (2017) Tasnim Abar, Asma Ben Letaifa, and Sadok El Asmi. 2017. Machine learning based Qo E prediction in SDN networks. In 2017 13th International Wireless Communications and Mobile Computing Conference (IWCMC) . IEEE, 1395–1400.
7Akhtar and et al. (2018) Zahaib Akhtar and et al. 2018. Oboe: auto-tuning video ABR algorithms to network conditions. In SIGCOMM 2018 . ACM, 44–58.
8Beazley et al . (1996) David M Beazley et al . 1996. SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++.. In Tcl/Tk Workshop . 43.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

\mathbcalComyco\mathbcal{Comyco}\mathbcalComyco: Quality-Aware Adaptive Video Streaming via

Abstract.

1. Introduction

2. Background and Challenges

2.1. ABR Overview

2.2. Challenges for learning-based ABRs

3. Methods

3.1. Training ABRs via Imitation Learning

3.2. Video Quality-aware ABR System Setup

4. System Overview

4.1. NN Architecture Overview

4.2. Instant Solver

4.3. Choice of Loss Functions for Comyco

4.4. Training Comyco with Experience Replay

4.5. Methodology

4.6. Parallel Training

4.7. Implementation

5. QoE Metrics & Video Datasets

5.1. QoE Model Setup

5.2. Video Datasets

6. Evaluation

6.1. Methodology

6.2. Comyco vs. ABR schemes

6.3. Comyco with Multiple Videos

6.4. Ablation Study

6.5. Comyco in the Real World

7. Related Work

8. Conclusion

$\mathbcal{Comyco}$ : Quality-Aware Adaptive Video Streaming via