GHTraffic: A Dataset for Reproducible Research in Service-Oriented Computing
Thilini Bhagya, Jens Dietrich, Hans Guesgen, Steve Versteeg

TL;DR
GHTraffic is a large, curated dataset of HTTP transactions from GitHub, designed to enable reproducible research in service-oriented computing by providing real and synthetic data along with analysis tools.
Contribution
This paper introduces GHTraffic, a comprehensive dataset with construction methods and metrics, supporting reproducible research in service-oriented computing.
Findings
GHTraffic includes extensive real and synthetic HTTP transaction data.
The dataset supports diverse research use cases in service-oriented computing.
Metrics characterize the dataset's scope and utility.
Abstract
We present GHTraffic, a dataset of significant size comprising HTTP transactions extracted from GitHub data and augmented with synthetic transaction data. The dataset facilitates reproducible research on many aspects of service-oriented computing. This paper discusses use cases for such a dataset and extracts a set of requirements from these use cases. We then discuss the design of GHTraffic, and the methods and tool used to construct it. We conclude our contribution with some selective metrics that characterise GHTraffic.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Scientific Computing and Data Management · Cloud Computing and Resource Management
