Inferring individual attributes from search engine queries and auxiliary   information

Luca Soldaini; Elad Yom-Tov

arXiv:1610.08442·cs.IR·May 16, 2018

Inferring individual attributes from search engine queries and auxiliary information

Luca Soldaini, Elad Yom-Tov

PDF

TL;DR

This paper presents a method to infer individual traits from anonymized search data using limited labeled examples and population statistics, with applications in identifying health conditions and disease distribution.

Contribution

It introduces a novel algorithm that leverages small labeled datasets and population info to identify user traits in anonymized search data, aiding medical research.

Findings

01

Successfully identified users with potential cancer indicators.

02

Predicted disease distribution from partial epidemiological data.

03

Validated approach using political and medical domain data.

Abstract

Internet data has surfaced as a primary source for investigation of different aspects of human behavior. A crucial step in such studies is finding a suitable cohort (i.e., a set of users) that shares a common trait of interest to researchers. However, direct identification of users sharing this trait is often impossible, as the data available to researchers is usually anonymized to preserve user privacy. To facilitate research on specific topics of interest, especially in medicine, we introduce an algorithm for identifying a trait of interest in anonymous users. We illustrate how a small set of labeled examples, together with statistical information about the entire population, can be aggregated to obtain labels on unseen examples. We validate our approach using labeled data from the political domain. We provide two applications of the proposed algorithm to the medical domain. In the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.