Understanding the Challenges and Assisting Developers with Developing Spark Applications
Zehao Wang

TL;DR
This paper investigates common challenges faced by developers using Apache Spark through an empirical study and proposes a sampling-based approach to assist in understanding and debugging Spark applications, showing promising preliminary results.
Contribution
It offers the first empirical analysis of Spark-related questions on Stack Overflow and introduces a novel debugging aid leveraging statistical sampling to improve developer support.
Findings
Most challenges relate to data transformation and API usage.
The proposed approach has low performance overhead.
Developers provided positive feedback on the approach.
Abstract
To process data more efficiently, big data frameworks provide data abstractions to developers. However, due to the abstraction, there may be many challenges for developers to understand and debug the data processing code. To uncover the challenges in using big data frameworks, we first conduct an empirical study on 1,000 Apache Spark-related questions on Stack Overflow. We find that most of the challenges are related to data transformation and API usage. To solve these challenges, we design an approach, which assists developers with understanding and debugging data processing in Spark. Our approach leverages statistical sampling to minimize performance overhead, and provides intermediate information and hint messages for each data processing step of a chained method pipeline. The preliminary evaluation of our approach shows that it has low performance overhead and we receive good…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Software Engineering Research
