An Email Attachment is Worth a Thousand Words, or Is It?
Gregory Tsipenyuk, Jon Crowcroft

TL;DR
This paper proposes a novel social network analysis method based on shared email attachments, revealing stronger social ties and organizational structures within email archives, exemplified by the Enron dataset.
Contribution
It introduces using shared email attachments as edges in social network analysis, highlighting their significance and providing insights aligned with organizational hierarchy.
Findings
Attachments constitute 80-90% of email archive disk space.
Shared attachments network reveals organizational structure.
Analysis shows different centrality measures highlight distinct social roles.
Abstract
There is an extensive body of research on Social Network Analysis (SNA) based on the email archive. The network used in the analysis is generally extracted either by capturing the email communication in From, To, Cc and Bcc email header fields or by the entities contained in the email message. In the latter case, the entities could be, for instance, the bag of words, url's, names, phones, etc. It could also include the textual content of attachments, for instance Microsoft Word documents, excel spreadsheets, or Adobe pdfs. The nodes in this network represent users and entities. The edges represent communication between users and relations to the entities. We suggest taking a different approach to the network extraction and use attachments shared between users as the edges. The motivation for this is two-fold. First, attachments represent the "intimacy" manifestation of the relation's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
