
TL;DR
This paper discusses methods for exploring large collections of digital textual traces, like emails and social media, to uncover evidence of real-world activities, balancing privacy concerns with valuable insights.
Contribution
It introduces computational techniques at the intersection of IR, NLP, and ML to support discovery and sense-making in large digital trace datasets.
Findings
Proposes new algorithms for digital trace exploration
Demonstrates effectiveness on social media and email datasets
Enhances understanding of online activity patterns
Abstract
In the era of big data, we continuously - and at times unknowingly - leave behind digital traces, by browsing, sharing, posting, liking, searching, watching, and listening to online content. When aggregated, these digital traces can provide powerful insights into the behavior, preferences, activities, and traits of people. While many have raised privacy concerns around the use of aggregated digital traces, it has undisputedly brought us many advances, from the search engines that learn from their users and enable our access to unforeseen amounts of data, knowledge, and information, to, e.g., the discovery of previously unknown adverse drug reactions from search engine logs. Whether in online services, journalism, digital forensics, law, or research, we increasingly set out to exploring large amounts of digital traces to discover new information. Consider for instance, the Enron…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Data Quality and Management · Personal Information Management and User Behavior
