Robust Logistic Regression using Shift Parameters (Long Version)

Julie Tibshirani; Christopher D. Manning

arXiv:1305.4987·cs.AI·April 30, 2014·5 cites

Robust Logistic Regression using Shift Parameters (Long Version)

Julie Tibshirani, Christopher D. Manning

PDF

Open Access

TL;DR

This paper introduces a robust logistic regression model that explicitly accounts for label noise, improving classification accuracy in noisy datasets like those from crowdsourcing or distant supervision.

Contribution

It proposes a novel extension of logistic regression that incorporates shift parameters for mislabeling, maintaining efficiency on high-dimensional data.

Findings

01

Significant improvement over standard logistic regression with noisy labels

02

Effective in named entity recognition tasks with annotation errors

03

Maintains computational efficiency on large, high-dimensional datasets

Abstract

Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels. In this paper, we present a robust extension of logistic regression that incorporates the possibility of mislabelling directly into the objective. Our model can be trained through nearly the same means as logistic regression, and retains its efficiency on high-dimensional datasets. Through named entity recognition experiments, we demonstrate that our approach can provide a significant improvement over the standard model when annotation errors are present.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Topic Modeling · Imbalanced Data Classification Techniques

MethodsLogistic Regression