# On Observability and Monitoring of Distributed Systems: An Industry   Interview Study

**Authors:** Sina Niedermaier, Falko Koetter, Andreas Freymann, Stefan Wagner

arXiv: 1907.12240 · 2021-02-10

## TL;DR

This study explores the challenges and best practices in observing and monitoring distributed systems, highlighting increasing complexity, organizational needs, and the importance of observability for system stability.

## Contribution

It provides qualitative insights from industry interviews, revealing challenges, organizational aspects, and the need for systematic approaches in observability and monitoring of distributed systems.

## Key findings

- Increasing complexity and dynamics in observability practices
- Discrepancy in awareness among management and developers
- Need for organizational strategies and roles in monitoring

## Abstract

Business success of companies heavily depends on the availability and performance of their client applications. Due to modern development paradigms such as DevOps and microservice architectural styles, applications are decoupled into services with complex interactions and dependencies. Although these paradigms enable individual development cycles with reduced delivery times, they cause several challenges to manage the services in distributed systems. One major challenge is to observe and monitor such distributed systems. This paper provides a qualitative study to understand the challenges and good practices in the field of observability and monitoring of distributed systems. In 28 semi-structured interviews with software professionals we discovered increasing complexity and dynamics in that field. Especially observability becomes an essential prerequisite to ensure stable services and further development of client applications. However, the participants mentioned a discrepancy in the awareness regarding the importance of the topic, both from the management as well as from the developer perspective. Besides technical challenges, we identified a strong need for an organizational concept including strategy, roles and responsibilities. Our results support practitioners in developing and implementing systematic observability and monitoring for distributed systems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.12240/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1907.12240/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1907.12240/full.md

---
Source: https://tomesphere.com/paper/1907.12240