A Frequency-Based Learning-To-Rank Approach for Personal Digital Traces
Daniela Vianna, Am\'elie Marian

TL;DR
This paper introduces a frequency-based learning-to-rank method utilizing LambdaMART and a multidimensional data model to enhance personal digital trace search accuracy, addressing data heterogeneity and lack of training data.
Contribution
It proposes a novel frequency-based learning-to-rank approach using LambdaMART and a multidimensional data model for personal digital traces, with a new training data generation method.
Findings
Improved search accuracy over traditional tools.
Effective handling of heterogeneous digital traces.
Successful application on real user data and email collections.
Abstract
Personal digital traces are constantly produced by connected devices, internet services and interactions. These digital traces are typically small, heterogeneous and stored in various locations in the cloud or on local devices, making it a challenge for users to interact with and search their own data. By adopting a multidimensional data model based on the six natural questions -- what, when, where, who, why and how -- to represent and unify heterogeneous personal digital traces, we can propose a learning-to-rank approach using the state of the art LambdaMART algorithm and frequency-based features that leverage the correlation between content (what), users (who), time (when), location (where) and data source (how) to improve the accuracy of search results. Due to the lack of publicly available personal training data, a combination of known-item query generation techniques and an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Personal Information Management and User Behavior · Web Data Mining and Analysis
