reproducing "ner and pos when nothing is capitalized"
Andreas Kuster, Jakub Filipek, Viswa Virinchi Muppirala

TL;DR
This paper reproduces a study on how lowercasing half of the dataset can mitigate performance drops in NER and POS tasks caused by casing mismatches, confirming original claims but with slightly lower results.
Contribution
The authors successfully reproduce the original findings on casing effects in NLP tasks and provide a publicly available implementation for further research.
Findings
Lowercasing 50% of data yields optimal performance.
Reproduction results are slightly lower than original claims.
Public GitHub repository available for transparency.
Abstract
Capitalization is an important feature in many NLP tasks such as Named Entity Recognition (NER) or Part of Speech Tagging (POS). We are trying to reproduce results of paper which shows how to mitigate a significant performance drop when casing is mismatched between training and testing data. In particular we show that lowercasing 50% of the dataset provides the best performance, matching the claims of the original paper. We also show that we got slightly lower performance in almost all experiments we have tried to reproduce, suggesting that there might be some hidden factors impacting our performance. Lastly, we make all of our work available in a public github repository.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
