Uzbek text summarization based on TF-IDF

Khabibulla Madatov; Shukurla Bekchanov; Jernej Vi\v{c}i\v{c}

arXiv:2303.00461·cs.CL·March 2, 2023·1 cites

Uzbek text summarization based on TF-IDF

Khabibulla Madatov, Shukurla Bekchanov, Jernej Vi\v{c}i\v{c}

PDF

Open Access

TL;DR

This paper introduces a TF-IDF based method for Uzbek text summarization, demonstrating its effectiveness in extracting key information from texts in an under-resourced language.

Contribution

The study presents a novel application of TF-IDF and n-gram techniques for Uzbek text summarization, addressing the lack of existing methods for this language.

Findings

01

Effective extraction of important text segments using TF-IDF

02

Summarization approach performs well on the Uzbek 'School corpus'

03

Potential applications in information retrieval and NLP for Uzbek

Abstract

The volume of information is increasing at an incredible rate with the rapid development of the Internet and electronic information services. Due to time constraints, we don't have the opportunity to read all this information. Even the task of analyzing textual data related to one field requires a lot of work. The text summarization task helps to solve these problems. This article presents an experiment on summarization task for Uzbek language, the methodology was based on text abstracting based on TF-IDF algorithm. Using this density function, semantically important parts of the text are extracted. We summarize the given text by applying the n-gram method to important parts of the whole text. The authors used a specially handcrafted corpus called "School corpus" to evaluate the performance of the proposed method. The results show that the proposed approach is effective in extracting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Educational Technology and Assessment