Atys: An Efficient Profiling Framework for Identifying Hotspot Functions in Large-scale Cloud Microservices
Jiaqi Sun, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue

TL;DR
Atys is a profiling framework designed for large-scale cloud microservices that efficiently identifies performance hotspots across diverse languages and distributed environments, reducing profiling costs while maintaining accuracy.
Contribution
It introduces a novel, language-agnostic profiling framework with a two-level aggregation, function selective pruning, and dynamic sampling adjustment for large-scale microservices.
Findings
FSP reduces profiling time by 6.8% with 0.58% MAPE.
FDA scheme cuts profiling cost by 87.6% while preserving accuracy.
Framework effectively identifies hotspot functions in distributed microservices.
Abstract
To handle the high volume of requests, large-scale services are comprised of thousands of instances deployed in clouds. These services utilize diverse programming languages and are distributed across various nodes as encapsulated containers. Given their vast scale, even minor performance enhancements can lead to significant cost reductions. In this paper, we introduce Atys1, an efficient profiling framework specifically designed to identify hotspot functions within large-scale distributed services. Atys presents four key features. First, it implements a language-agnostic adaptation mechanism for multilingual microservices. Second, a two-level aggregation method is introduced to provide a comprehensive overview of flamegraphs. Third, we propose a function selective pruning (FSP) strategy to enhance the efficiency of aggregating profiling results. Finally, we develop a frequency dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · IoT and Edge/Fog Computing
