Entities of Interest

David Graus

arXiv:2102.10962·cs.IR·February 23, 2021

Entities of Interest

David Graus

PDF

Open Access

TL;DR

This paper discusses methods for exploring large collections of digital textual traces, like emails and social media, to uncover evidence of real-world activities, balancing privacy concerns with valuable insights.

Contribution

It introduces computational techniques at the intersection of IR, NLP, and ML to support discovery and sense-making in large digital trace datasets.

Findings

01

Proposes new algorithms for digital trace exploration

02

Demonstrates effectiveness on social media and email datasets

03

Enhances understanding of online activity patterns

Abstract

In the era of big data, we continuously - and at times unknowingly - leave behind digital traces, by browsing, sharing, posting, liking, searching, watching, and listening to online content. When aggregated, these digital traces can provide powerful insights into the behavior, preferences, activities, and traits of people. While many have raised privacy concerns around the use of aggregated digital traces, it has undisputedly brought us many advances, from the search engines that learn from their users and enable our access to unforeseen amounts of data, knowledge, and information, to, e.g., the discovery of previously unknown adverse drug reactions from search engine logs. Whether in online services, journalism, digital forensics, law, or research, we increasingly set out to exploring large amounts of digital traces to discover new information. Consider for instance, the Enron…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Data Quality and Management · Personal Information Management and User Behavior