Stack Exchange Tagger
Sanket Mehta, Shagun Sodhani

TL;DR
This paper presents a multilabel classification approach for tagging Stack Exchange questions, comparing various SVC configurations and finding linear SVC with Crammer Singer to be most effective.
Contribution
It introduces an effective multilabel classifier for Stack Exchange questions and evaluates different SVC configurations for optimal performance.
Findings
Linear SVC with Crammer Singer yields best results
Support Vector Classification is effective for large-scale multilabel text classification
Comparison of kernel functions and loss functions for SVC
Abstract
The goal of our project is to develop an accurate tagger for questions posted on Stack Exchange. Our problem is an instance of the more general problem of developing accurate classifiers for large scale text datasets. We are tackling the multilabel classification problem where each item (in this case, question) can belong to multiple classes (in this case, tags). We are predicting the tags (or keywords) for a particular Stack Exchange post given only the question text and the title of the post. In the process, we compare the performance of Support Vector Classification (SVC) for different kernel functions, loss function, etc. We found linear SVC with Crammer Singer technique produces best results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Topic Modeling
