fLSA: Learning Semantic Structures in Document Collections Using Foundation Models
Weijia Xu, Nebojsa Jojic, Nicolas Le Roux

TL;DR
fLSA is a novel method that uses foundation models to induce high-level semantic structures in documents through iterative clustering and tagging, improving text reconstruction and hierarchical sampling for reasoning tasks.
Contribution
The paper introduces fLSA, a new approach leveraging foundation models for semantic structure induction and hierarchical sampling in document collections.
Findings
fLSA tags outperform existing methods in reconstructing original texts.
Hierarchical sampling with fLSA increases the likelihood of correct solutions.
fLSA effectively models latent document structures for diverse tasks.
Abstract
Humans can learn to solve new tasks by inducing high-level strategies from example solutions to similar problems and then adapting these strategies to solve unseen problems. Can we use large language models to induce such high-level structure from example documents or solutions? We introduce fLSA, a foundation-model-based Latent Semantic Analysis method that iteratively clusters and tags document segments based on document-level contexts. These tags can be used to model the latent structure of given documents and for hierarchical sampling of new texts. Our experiments on story writing, math, and multi-step reasoning datasets demonstrate that fLSA tags are more informative in reconstructing the original texts than existing tagging methods. Moreover, when used for hierarchical sampling, fLSA tags help expand the output space in the right directions that lead to correct solutions more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
