Demographic differences in search engine use with implications for cohort selection
Elad Yom-Tov

TL;DR
This study analyzes how demographic factors like age and gender influence search engine query text, revealing biases that impact health research and fairness in data collection.
Contribution
It provides novel insights into demographic biases in search queries, emphasizing the need for careful cohort selection in health and fairness studies.
Findings
Females and younger users write longer queries.
Females make 25% more long queries than males.
Gender and age biases skew user cohort representation.
Abstract
The correlation between the demographics of users and the text they write has been investigated through literary texts and, more recently, social media. However, differences pertaining to language use in search engines has not been thoroughly analyzed, especially for age and gender differences. Such differences are important especially due to the growing use of search engine data in the study of human health, where queries are used to identify patient populations. Using data from multiple general-purpose Internet search engines gathered over a period of one month we investigate the correlation between demography (age, gender, and income) and the text of queries submitted to search engines. Our results show that females and younger people use longer queries. This difference is such that females make approximately 25% more queries with 10 or more words. In the case of queries which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
