Classifying Vietnamese Disease Outbreak Reports with Important Sentences   and Rich Features

Son Doan; Nguyen Thi Ngoc Vinh; Tu Minh Phuong

arXiv:1911.09883·cs.CL·November 25, 2019

Classifying Vietnamese Disease Outbreak Reports with Important Sentences and Rich Features

Son Doan, Nguyen Thi Ngoc Vinh, Tu Minh Phuong

PDF

TL;DR

This paper improves Vietnamese disease outbreak report classification by identifying important sentences and using rich features, achieving higher accuracy than using raw text alone.

Contribution

It introduces a method that combines important sentences and rich features for better classification of Vietnamese disease reports, outperforming baseline approaches.

Findings

01

Best F-score of 86.67% using sentence and location features

02

Using important sentences improves classification performance

03

Rich features enhance Vietnamese disease outbreak report classification

Abstract

Text classification is an important field of research from mid 90s up to now. It has many applications, one of them is in Web-based biosurveillance systems which identify and summarize online disease outbreak reports. In this paper we focus on classifying Vietnamese disease outbreak reports. We investigate important properties of disease outbreak reports, e.g., sentences containing names of outbreak disease, locations. Evaluation on 10-time 10- fold cross-validation using the Support Vector Machine algorithm shows that using sentences containing disease outbreak names with its preceding/following sentences in combination with location features achieve the best F-score with 86.67% - an improvement of 0.38% in comparison to using all raw text. Our results suggest that using important sentences and rich feature can improve performance of Vietnamese disease outbreak text classification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.