Headline-Guided Extractive Summarization for Thai News Articles
Pimpitchaya Kositcharoensuk, Nakarin Sritrakool, Ploy N. Pratanwanich

TL;DR
This paper introduces CHIMA, a novel extractive summarization model for Thai news articles that leverages headlines to improve the selection of key sentences, outperforming baseline models in multiple evaluation metrics.
Contribution
The study proposes a headline-guided extractive summarization approach for Thai texts, incorporating headline-body similarity strategies and utilizing pre-trained language models for better semantic understanding.
Findings
CHIMA outperforms baseline models on ROUGE, BLEU, and F1 scores.
Incorporating headline information improves the recall of critical sentences.
Headline-body similarity strategies enhance sentence selection flexibility.
Abstract
Text summarization is a process of condensing lengthy texts while preserving their essential information. Previous studies have predominantly focused on high-resource languages, while low-resource languages like Thai have received less attention. Furthermore, earlier extractive summarization models for Thai texts have primarily relied on the article's body, without considering the headline. This omission can result in the exclusion of key sentences from the summary. To address these limitations, we propose CHIMA, an extractive summarization model that incorporates the contextual information of the headline for Thai news articles. Our model utilizes a pre-trained language model to capture complex language semantics and assigns a probability to each sentence to be included in the summary. By leveraging the headline to guide sentence selection, CHIMA enhances the model's ability to recover…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
