# Fault and Performance Management in Multi-Cloud Based NFV using Shallow   and Deep Predictive Structures

**Authors:** Lav Gupta, M. Samaka, Raj Jain, Aiman Erbad, Deval Bhamare, H. Anthony, Chan

arXiv: 1903.11993 · 2019-03-29

## TL;DR

This paper introduces a hybrid shallow and deep learning model for fault detection and localization in multi-cloud NFV environments, addressing the lack of standard fault management frameworks and improving network reliability.

## Contribution

It proposes a novel combined shallow and deep learning approach for fault management in NFV, enhancing detection accuracy and root cause localization in complex virtual networks.

## Key findings

- Shallow models effectively detect simple fault conditions.
- Deep autoencoders improve fault localization accuracy.
- Model evaluated on real network fault datasets.

## Abstract

Deployment of Network Function Virtualization (NFV) over multiple clouds accentuates its advantages like the flexibility of virtualization, proximity to customers and lower total cost of operation. However, NFV over multiple clouds has not yet attained the level of performance to be a viable replacement for traditional networks. One of the reasons is the absence of a standard based Fault, Configuration, Accounting, Performance and Security (FCAPS) framework for the virtual network services. In NFV, faults and performance issues can have complex geneses within virtual resources as well as virtual networks and cannot be effectively handled by traditional rule-based systems. To tackle the above problem, we propose a fault detection and localization model based on a combination of shallow and deep learning structures. Relatively simpler detection of 'fault' and 'no-fault' conditions or 'manifest' and 'impending' faults have been effectively shown to be handled by shallow machine learning structures like Support Vector Machine (SVM). Deeper structure, i.e. the stacked autoencoder has been found to be useful for a more complex localization function where a large amount of information needs to be worked through, in different layers, to get to the root cause of the problem. We provide evaluation results using a dataset adapted from logs of disruption in an operator's live network fault datasets available on Kaggle and another based on multivariate kernel density estimation and Markov sampling.

---
Source: https://tomesphere.com/paper/1903.11993