Towards Data Quality Assessment in Online Advertising
Sahin Cem Geyik, Jianqiang Shen, Shahriar Shariat, Ali Dasdan, Santanu, Kolay

TL;DR
This paper introduces a scalable framework for assessing data quality in online advertising, focusing on evaluating data source relevance when user-level ground truth is unavailable or anonymized.
Contribution
It proposes multiple methodologies for large-scale data quality assessment and demonstrates their application in targeted advertising and audience forecasting.
Findings
Framework effectively evaluates data source similarity to ground truth.
Methodologies compare in terms of accuracy and scalability.
Preliminary results show promising use in targeted advertising.
Abstract
In online advertising, our aim is to match the advertisers with the most relevant users to optimize the campaign performance. In the pursuit of achieving this goal, multiple data sources provided by the advertisers or third-party data providers are utilized to choose the set of users according to the advertisers' targeting criteria. In this paper, we present a framework that can be applied to assess the quality of such data sources in large scale. This framework efficiently evaluates the similarity of a specific data source categorization to that of the ground truth, especially for those cases when the ground truth is accessible only in aggregate, and the user-level information is anonymized or unavailable due to privacy reasons. We propose multiple methodologies within this framework, present some preliminary assessment results, and evaluate how the methodologies compare to each other.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Spam and Phishing Detection
