Threats of Human Error in a High-Performance Storage System: Problem Statement and Case Study
Elizabeth Haubert

TL;DR
This paper discusses the risks of human error in high-performance storage systems, emphasizing the importance of balancing automation with operator skill and system awareness to prevent failures.
Contribution
It highlights the potential pitfalls of automation in system administration and stresses the need for maintaining operator expertise alongside automated tools.
Findings
Automation can reduce errors but may decrease operator skill.
Maintaining system awareness is crucial to prevent errors.
Balance between automation and skill is essential for system reliability.
Abstract
System administration is a difficult, often tedious, job requiring many skilled laborers. The data that is protected by system administrators is often valued at or above the value of the institution maintaining that data. A number of ethnographic studies have confirmed the skill of these operators, and the difficulty of providing adequate tools. In an effort to minimize the maintenance costs, an increasing portion of system administration is subject to automation - particularly simple, routine tasks such as data backup. While such tools reduce the risk of errors from carelessness, the same tools may result in reduced skill and system familiarity in experienced workers. Care should be taken to ensure that operators maintain system awareness without placing the operator in a passive, monitoring role.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Safety Analysis
