SPBERTQA: A Two-Stage Question Answering System Based on Sentence Transformers for Medical Texts
Nhung Thi-Hong Nguyen, Phuong Phan-Dieu Ha, Luan Thanh Nguyen, Kiet, Van Nguyen, Ngan Luu-Thuy Nguyen

TL;DR
This paper introduces SPBERTQA, a two-stage question answering system for Vietnamese medical texts, utilizing Sentence-BERT and BM25, and presents a new healthcare QA dataset for Vietnamese.
Contribution
The paper develops a novel two-stage QA system based on Sentence-BERT and creates the first Vietnamese healthcare QA dataset with over 10,000 pairs.
Findings
The system outperforms traditional QA methods.
The dataset enables better evaluation of Vietnamese medical QA.
Experimental results show improved accuracy and relevance.
Abstract
Question answering (QA) systems have gained explosive attention in recent years. However, QA tasks in Vietnamese do not have many datasets. Significantly, there is mostly no dataset in the medical domain. Therefore, we built a Vietnamese Healthcare Question Answering dataset (ViHealthQA), including 10,015 question-answer passage pairs for this task, in which questions from health-interested users were asked on prestigious health websites and answers from highly qualified experts. This paper proposes a two-stage QA system based on Sentence-BERT (SBERT) using multiple negatives ranking (MNR) loss combined with BM25. Then, we conduct diverse experiments with many bag-of-words models to assess our system's performance. With the obtained results, this system achieves better performance than traditional methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems
