Bootstrapping Large-Scale Fine-Grained Contextual Advertising Classifier from Wikipedia
Yiping Jin, Vishakha Kadam, Dittaya Wanvarie

TL;DR
This paper introduces wiki2cat, a scalable method for fine-grained contextual advertising classification that leverages Wikipedia's category graph to generate training data without manual labeling, improving large-scale categorization accuracy.
Contribution
The paper presents wiki2cat, a novel approach that uses Wikipedia categories to bootstrap large-scale fine-grained classifiers without manual annotation or expert rules.
Findings
Achieves competitive performance on multiple datasets
Handles over 300 fine-grained categories effectively
Outperforms keyword-based baselines
Abstract
Contextual advertising provides advertisers with the opportunity to target the context which is most relevant to their ads. However, its power cannot be fully utilized unless we can target the page content using fine-grained categories, e.g., "coupe" vs. "hatchback" instead of "automotive" vs. "sport". The widely used advertising content taxonomy (IAB taxonomy) consists of 23 coarse-grained categories and 355 fine-grained categories. With the large number of categories, it becomes very challenging either to collect training documents to build a supervised classification model, or to compose expert-written rules in a rule-based classification system. Besides, in fine-grained classification, different categories often overlap or co-occur, making it harder to classify accurately. In this work, we propose wiki2cat, a method to tackle the problem of large-scaled fine-grained text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Wikis in Education and Collaboration · Spam and Phishing Detection
