Distributed Deep Learning for Question Answering
Minwei Feng, Bing Xiang, Bowen Zhou

TL;DR
This paper empirically evaluates distributed deep learning methods for question answering tasks, demonstrating significant speedups and efficiency improvements using various optimization algorithms and a message passing interface framework.
Contribution
It provides a comprehensive comparison of distributed training algorithms and highlights the effectiveness of distributed frameworks in accelerating question answering models.
Findings
Distributed training accelerates convergence speed.
A 24x speedup achieved with 48 workers for answer selection.
Training time reduced from 138.2 hours to 5.81 hours.
Abstract
This paper is an empirical study of the distributed deep learning for question answering subtasks: answer selection and question classification. Comparison studies of SGD, MSGD, ADADELTA, ADAGRAD, ADAM/ADAMAX, RMSPROP, DOWNPOUR and EASGD/EAMSGD algorithms have been presented. Experimental results show that the distributed framework based on the message passing interface can accelerate the convergence speed at a sublinear scale. This paper demonstrates the importance of distributed training. For example, with 48 workers, a 24x speedup is achievable for the answer selection task and running time is decreased from 138.2 hours to 5.81 hours, which will increase the productivity significantly.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent
