Enron versus EUSES: A Comparison of Two Spreadsheet Corpora
Bas Jansen

TL;DR
This study compares two large datasets of business spreadsheets, Enron and EUSES, revealing their similarities in size, complexity, and structure, and providing insights into spreadsheet use in corporate environments.
Contribution
It offers a detailed comparison of Enron and EUSES spreadsheet corpora, enhancing understanding of spreadsheet characteristics in corporate contexts and validating the EUSES dataset's representativeness.
Findings
Most spreadsheets are small with simple formulas.
EUSES and Enron spreadsheets have similar structural characteristics.
Spreadsheets generally exhibit low coupling and complexity.
Abstract
Spreadsheets are widely used within companies and often form the basis for business decisions. Numerous cases are known where incorrect information in spreadsheets has lead to incorrect decisions. Such cases underline the relevance of research on the professional use of spreadsheets. Recently a new dataset became available for research, containing over 15.000 business spreadsheets that were extracted from the Enron E-mail Archive. With this dataset, we 1) aim to obtain a thorough understanding of the characteristics of spreadsheets used within companies, and 2) compare the characteristics of the Enron spreadsheets with the EUSES corpus which is the existing state of the art set of spreadsheets that is frequently used in spreadsheet studies. Our analysis shows that 1) the majority of spreadsheets are not large in terms of worksheets and formulas, do not have a high degree of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpreadsheets and End-User Computing
