UIT-ViCoV19QA: A Dataset for COVID-19 Community-based Question Answering on Vietnamese Language
Triet Minh Thai, Ngan Ha-Thao Chu, Anh Tuan Vo, Son T. Luu

TL;DR
This paper introduces UIT-ViCoV19QA, a Vietnamese COVID-19 question answering dataset with 4,500 pairs, and evaluates deep learning models to establish benchmarks for future research in community-based health information retrieval.
Contribution
It presents the first Vietnamese COVID-19 QA dataset and provides baseline deep learning models with benchmark results for future research.
Findings
Multiple paraphrased answers improve model performance.
Transformer models outperform other architectures.
Benchmark metrics establish a standard for future work.
Abstract
For the last two years, from 2020 to 2021, COVID-19 has broken disease prevention measures in many countries, including Vietnam, and negatively impacted various aspects of human life and the social community. Besides, the misleading information in the community and fake news about the pandemic are also serious situations. Therefore, we present the first Vietnamese community-based question answering dataset for developing question answering systems for COVID-19 called UIT-ViCoV19QA. The dataset comprises 4,500 question-answer pairs collected from trusted medical sources, with at least one answer and at most four unique paraphrased answers per question. Along with the dataset, we set up various deep learning models as baseline to assess the quality of our dataset and initiate the benchmark results for further research through commonly used metrics such as BLEU, METEOR, and ROUGE-L. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Dropout · Dense Connections · Residual Connection
