# Anomaly Detection for Network Connection Logs

**Authors:** Swapneel Mehta, Prasanth Kothuri, Daniel Lanza Garcia

arXiv: 1812.01941 · 2018-12-06

## TL;DR

This paper presents a streaming architecture utilizing ELK, Spark, and Hadoop to analyze network connection logs in near real-time, detecting anomalies through unsupervised learning and visualization, scalable to large infrastructures.

## Contribution

It introduces a novel approach for evaluating untagged, unfiltered connection logs using unsupervised learning, scalable to extensive network infrastructures.

## Key findings

- Effective anomaly detection in large-scale logs
- Visualization aids in understanding outliers
- Scalable system for real-time log analysis

## Abstract

We leverage a streaming architecture based on ELK, Spark and Hadoop in order to collect, store, and analyse database connection logs in near real-time. The proposed system investigates outliers using unsupervised learning; widely adopted clustering and classification algorithms for log data, highlighting the subtle variances in each model by visualisation of outliers. Arriving at a novel solution to evaluate untagged, unfiltered connection logs, we propose an approach that can be extrapolated to a generalised system of analysing connection logs across a large infrastructure comprising thousands of individual nodes and generating hundreds of lines in logs per second.

---
Source: https://tomesphere.com/paper/1812.01941