Wiki to Automotive: Understanding the Distribution Shift and its impact on Named Entity Recognition
Anmol Nayak, Hari Prasad Timmapathini

TL;DR
This paper investigates the distribution shift in automotive domain text and its impact on Named Entity Recognition, revealing challenges like OOV words and context sparsity that hinder transfer learning performance.
Contribution
It provides an analysis of distribution shift characteristics in automotive text and evaluates the performance of BERT and SciBERT models on NER tasks in this domain.
Findings
SciBERT outperforms BERT in automotive NER
Fine-tuning with domain data yields limited improvements
Distribution shift features include OOV words and context sparsity
Abstract
While transfer learning has become a ubiquitous technique used across Natural Language Processing (NLP) tasks, it is often unable to replicate the performance of pre-trained models on text of niche domains like Automotive. In this paper we aim to understand the main characteristics of the distribution shift with automotive domain text (describing technical functionalities such as Cruise Control) and attempt to explain the potential reasons for the gap in performance. We focus on performing the Named Entity Recognition (NER) task as it requires strong lexical, syntactic and semantic understanding by the model. Our experiments with 2 different encoders, namely BERT-Base-Uncased and SciBERT-Base-Scivocab-Uncased have lead to interesting findings that showed: 1) The performance of SciBERT is better than BERT when used for automotive domain, 2) Fine-tuning the language models with automotive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Linear Warmup With Linear Decay · WordPiece · Layer Normalization · Weight Decay · Dense Connections
