Improving QA Efficiency with DistilBERT: Fine-Tuning and Inference on mobile Intel CPUs
Ngeyen Yinkfu

TL;DR
This paper develops a DistilBERT-based question-answering model optimized for real-time inference on Intel CPUs, balancing accuracy and efficiency for resource-limited environments.
Contribution
It introduces a fine-tuning and inference approach for DistilBERT tailored for CPU deployment, with systematic evaluation of data augmentation and hyperparameters.
Findings
Validation F1 score of 0.6536 on SQuAD v1.1
Inference time of 0.1208 seconds per question
Outperforms rule-based baseline in accuracy
Abstract
This study presents an efficient transformer-based question-answering (QA) model optimized for deployment on a 13th Gen Intel i7-1355U CPU, using the Stanford Question Answering Dataset (SQuAD) v1.1. Leveraging exploratory data analysis, data augmentation, and fine-tuning of a DistilBERT architecture, the model achieves a validation F1 score of 0.6536 with an average inference time of 0.1208 seconds per question. Compared to a rule-based baseline (F1: 0.3124) and full BERT-based models, our approach offers a favorable trade-off between accuracy and computational efficiency. This makes it well-suited for real-time applications on resource-constrained systems. The study includes systematic evaluation of data augmentation strategies and hyperparameter configurations, providing practical insights into optimizing transformer models for CPU-based inference.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Attention Dropout · Softmax · WordPiece · Weight Decay · Multi-Head Attention · Dropout · Attention Is All You Need · Residual Connection
