Automatic Text Summarization Approaches to Speed up Topic Model Learning   Process

Mohamed Morchid; Juan-Manuel Torres-Moreno; Richard Dufour; Javier; Ram\'irez-Rodr\'iguez; Georges Linar\`es

arXiv:1703.06630·cs.IR·March 21, 2017·1 cites

Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

Mohamed Morchid, Juan-Manuel Torres-Moreno, Richard Dufour, Javier, Ram\'irez-Rodr\'iguez, Georges Linar\`es

PDF

Open Access

TL;DR

This paper investigates using summarized documents to build topic spaces, significantly reducing processing time while maintaining relevance across multiple languages in big data scenarios.

Contribution

It introduces a method to create topic models from summaries, demonstrating substantial time savings without sacrificing accuracy in multilingual contexts.

Findings

01

Summaries produce comparable topic representations to full texts.

02

Processing time is reduced by over 60% using summaries.

03

Effectiveness is consistent across different languages.

Abstract

The number of documents available into Internet moves each day up. For this reason, processing this amount of information effectively and expressibly becomes a major concern for companies and scientists. Methods that represent a textual document by a topic representation are widely used in Information Retrieval (IR) to process big data such as Wikipedia articles. One of the main difficulty in using topic model on huge data collection is related to the material resources (CPU time and memory) required for model estimate. To deal with this issue, we propose to build topic spaces from summarized documents. In this paper, we present a study of topic space representation in the context of big data. The topic space representation behavior is analyzed on different languages. Experiments show that topic spaces estimated from text summaries are as relevant as those estimated from the complete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Topic Modeling · Web Data Mining and Analysis