The CESAW dataset: a conversation
Derek M. Jones, William R. Nichols

TL;DR
This paper introduces the CESAW dataset, a comprehensive collection of developer task data from multiple projects, analyzed through a conversational format to explore task organization, effort estimation, and project safety aspects.
Contribution
The paper presents the CESAW dataset, including detailed task and project data, and demonstrates its use in analyzing task sequencing and effort estimation factors.
Findings
Effort estimation accuracy varies by estimator and project.
Task sequencing can be analyzed using hierarchical WBS data.
Round number bias affects effort estimates.
Abstract
An analysis of the 61,817 tasks performed by developers working on 45 projects, implemented using Team Software Process, is documented via a conversation between a data analyst and the person who collected, compiled, and originally analyzed the data. Five projects were safety critical, containing a total of 28,899 tasks. Projects were broken down using a Work Breakdown Structure to create a hierarchical organization, with tasks at the leaf nodes. The WBS information enables task organization within a project to be investigated, e.g., how related tasks are sequenced together. Task data includes: kind of task, anonymous developer id, start/end time/date, as well as interruption and break times; a total of 203,621 time facts. Task effort estimation accuracy was found to be influenced by factors such as the person making the estimate, the project involved, and the propensity to use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Business Process Modeling and Analysis
