# Dataset on resource allocation and usage for a private cloud

**Authors:** Paola Marques, Mariana Mendes, Thiago Emmanuel Pereira, Giovanni Farias

PMC · DOI: 10.1016/j.dib.2026.112514 · 2026-01-29

## TL;DR

This paper introduces a large dataset capturing resource usage in a private cloud, offering insights for cloud computing research and repatriation strategies.

## Contribution

The paper provides a novel, time-stamped dataset from a private cloud, addressing the scarcity of such data for non-commercial environments.

## Key findings

- The dataset includes over 64 million records collected over twelve months from an OpenStack-based private cloud.
- It captures resource allocation, user-project associations, and utilization metrics while preserving privacy through anonymization.
- The dataset supports temporal analysis and is useful for academic and commercial cloud research.

## Abstract

While public cloud providers dominate the commercial landscape, private clouds are widely adopted by academic and research institutions to meet specific governance and operational requirements. There are multiple available datasets about resource usage of public clouds; however, datasets capturing usage patterns in private clouds remain scarce, which limits research in this area. This work presents a dataset comprising over 64 million records collected from a private OpenStack-based cloud operated by the Distributed Systems Laboratory at the Federal University of Campina Grande, Brazil. Data was continuously gathered over nearly twelve months (May 23, 2024 to May 16, 2025), periodically querying OpenStack APIs and monitoring services every five minutes. The dataset captures different aspects of the infrastructure, allocation quotas, user-to-project associations (as OpenStack groups users into projects), server (virtual machines) specifications, and resource utilization for users and projects. Entries are timestamped, enabling temporal analyses of system dynamics. Sensitive attributes, such as user names, project names, IP addresses, and server names were protected, leaving only system-generated UUIDs. By offering a detailed, time-stamped, view of a private cloud, this dataset provides a valuable resource for cloud computing research, helping to bridge the gap in publicly available datasets from non-commercial cloud environments. The dataset is valuable not only for academic institutions but also for companies considering cloud repatriation.

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12907864/full.md

---
Source: https://tomesphere.com/paper/PMC12907864