Iterative Residual Rescaling: An Analysis and Generalization of LSI
Rie Kubota Ando, Lillian Lee

TL;DR
This paper introduces a new framework for analyzing document representation methods like LSI, explaining how IRR improves semantic similarity measurement by addressing distributional non-uniformity, and proposes an automatic rescaling method validated by experiments.
Contribution
It provides a formal subspace framework for LSI analysis, clarifies IRR's effectiveness, and introduces an automatic rescaling technique for better document representations.
Findings
IRR compensates for non-uniform document distributions
The framework links LSI performance to distribution uniformity
Automatic rescaling improves semantic similarity measurement
Abstract
We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novel subspace-based framework for formalizing this task. Using this framework, we derive a new analysis of Latent Semantic Indexing (LSI), showing a precise relationship between its performance and the uniformity of the underlying distribution of documents over topics. This analysis helps explain the improvements gained by Ando's (2000) Iterative Residual Rescaling (IRR) algorithm: IRR can compensate for distributional non-uniformity. A further benefit of our framework is that it provides a well-motivated, effective method for automatically determining the rescaling factor IRR depends on, leading to further improvements. A series of experiments over various settings and with several evaluation metrics validates our claims.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
