# Big Data for Traffic Monitoring and Management

**Authors:** Martino Trevisan

arXiv: 1902.11095 · 2019-03-01

## TL;DR

This paper explores how big data and machine learning can be applied to analyze Internet traffic for improved monitoring and management, focusing on ISP networks and addressing data complexity challenges.

## Contribution

It introduces tailored machine learning techniques specifically designed for analyzing complex, multi-dimensional network traffic data from ISP and campus networks.

## Key findings

- Effective application of machine learning to network traffic data
- Identification of challenges in big data network analysis
- Proposed novel algorithms tailored for network traffic analysis

## Abstract

The last two decades witnessed tremendous advances in the Information and Communications Technologies. Beside improvements in computational power and storage capacity, communication networks carry nowadays an amount of data which was not envisaged only few years ago. Together with their pervasiveness, network complexity increased at the same pace, leaving operators and researchers with few instruments to understand what happens in the networks, and, on the global scale, on the Internet. Fortunately, recent advances in data science and machine learning come to the rescue of network analysts, and allow analyses with a level of complexity and spatial/temporal scope not possible only 10 years ago. In my thesis, I take the perspective of an Internet Service Provider (ISP), and illustrate challenges and possibilities of analyzing the traffic coming from modern operational networks. I make use of big data and machine learning algorithms, and apply them to datasets coming from passive measurements of ISP and University Campus networks. The marriage between data science and network measurements is complicated by the complexity of machine learning algorithms, and by the intrinsic multi-dimensionality and variability of this kind of data. As such, my work proposes and evaluates novel techniques, inspired from popular machine learning approaches, but carefully tailored to operate with network traffic.

---
Source: https://tomesphere.com/paper/1902.11095