Training IBM Watson using Automatically Generated Question-Answer Pairs
Jangho Lee, Gyuwan Kim, Jaeyoon Yoo, Changwoo Jung, Minseok Kim,, Sungroh Yoon

TL;DR
This paper explores the use of a large-scale automatically generated question-answer dataset to efficiently train IBM Watson, demonstrating its effectiveness and complementarity to manual data.
Contribution
First to investigate large-scale auto-generated question-answer pairs for training IBM Watson, improving training efficiency and accuracy.
Findings
Auto-generated dataset effectively trains Watson
Complementary to manual question-answer pairs
Enhances training efficiency and accuracy
Abstract
IBM Watson is a cognitive computing system capable of question answering in natural languages. It is believed that IBM Watson can understand large corpora and answer relevant questions more effectively than any other question-answering system currently available. To unleash the full power of Watson, however, we need to train its instance with a large number of well-prepared question-answer pairs. Obviously, manually generating such pairs in a large quantity is prohibitively time consuming and significantly limits the efficiency of Watson's training. Recently, a large-scale dataset of over 30 million question-answer pairs was reported. Under the assumption that using such an automatically generated dataset could relieve the burden of manual question-answer generation, we tried to use this dataset to train an instance of Watson and checked the training efficiency and accuracy. According…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
