An Empirical Study on the Bugs Found while Reusing Pre-trained Natural Language Processing Models
Rangeet Pan, Sumon Biswas, Mohna Chakraborty, Breno Dantas Cruz,, Hridesh Rajan

TL;DR
This study analyzes 984 bugs from 11 popular NLP pre-trained models to understand their types, causes, and impacts, revealing challenges like robustness issues, bias propagation, and resource consumption.
Contribution
It provides a comprehensive taxonomy of bugs in NLP model reuse, highlighting key issues and patterns to guide future bug reduction efforts.
Findings
Limited access to model internals affects robustness.
Input validation issues propagate bias.
High resource use leads to crashes.
Abstract
In NLP, reusing pre-trained models instead of training from scratch has gained popularity; however, NLP models are mostly black boxes, very large, and often require significant resources. To ease, models trained with large corpora are made available, and developers reuse them for different problems. In contrast, developers mostly build their models from scratch for traditional DL-related problems. By doing so, they have control over the choice of algorithms, data processing, model structure, tuning hyperparameters, etc. Whereas in NLP, due to the reuse of the pre-trained models, NLP developers are limited to little to no control over such design decisions. They either apply tuning or transfer learning on pre-trained models to meet their requirements. Also, NLP models and their corresponding datasets are significantly larger than the traditional DL models and require heavy computation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
