Robust Logistic Regression using Shift Parameters (Long Version)
Julie Tibshirani, Christopher D. Manning

TL;DR
This paper introduces a robust logistic regression model that explicitly accounts for label noise, improving classification accuracy in noisy datasets like those from crowdsourcing or distant supervision.
Contribution
It proposes a novel extension of logistic regression that incorporates shift parameters for mislabeling, maintaining efficiency on high-dimensional data.
Findings
Significant improvement over standard logistic regression with noisy labels
Effective in named entity recognition tasks with annotation errors
Maintains computational efficiency on large, high-dimensional datasets
Abstract
Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels. In this paper, we present a robust extension of logistic regression that incorporates the possibility of mislabelling directly into the objective. Our model can be trained through nearly the same means as logistic regression, and retains its efficiency on high-dimensional datasets. Through named entity recognition experiments, we demonstrate that our approach can provide a significant improvement over the standard model when annotation errors are present.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Topic Modeling · Imbalanced Data Classification Techniques
MethodsLogistic Regression
