Probabilistic Latent Semantic Analysis (PLSA) untuk Klasifikasi Dokumen   Teks Berbahasa Indonesia

Derwin Suhartono

arXiv:1512.00576·cs.CL·December 3, 2015·2 cites

Probabilistic Latent Semantic Analysis (PLSA) untuk Klasifikasi Dokumen Teks Berbahasa Indonesia

Derwin Suhartono

PDF

Open Access

TL;DR

This paper explores the application of Probabilistic Latent Semantic Analysis (PLSA) with Expectation Maximization for Indonesian text document classification, focusing on keyword-based document representation and retrieval.

Contribution

It presents a detailed explanation of PLSA mechanism, training, testing, and accuracy measurement specifically for Indonesian language document classification.

Findings

01

PLSA effectively extracts meaningful keywords for Indonesian texts.

02

The EM algorithm successfully trains the PLSA model.

03

The approach achieves promising accuracy in document classification.

Abstract

One task that is included in managing documents is how to find substantial information inside. Topic modeling is a technique that has been developed to produce document representation in form of keywords. The keywords will be used in the indexing process and document retrieval as needed by users. In this research, we will discuss specifically about Probabilistic Latent Semantic Analysis (PLSA). It will cover PLSA mechanism which involves Expectation Maximization (EM) as the training algorithm, how to conduct testing, and obtain the accuracy result.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Edcuational Technology Systems · Text and Document Classification Technologies