Expedition: A System for the Unsupervised Learning of a Hierarchy of Concepts
Omid Madani

TL;DR
Expedition is a self-supervised system that learns a hierarchical set of concepts from character strings, enabling effective text segmentation and understanding without labeled data, with promising results in discovering meaningful language structures.
Contribution
The paper introduces a novel bottom-up, unsupervised framework for learning layered concepts and their hierarchies directly from raw text, promoting better segmentation and interpretability.
Findings
Learns tens of thousands of concepts in small-scale experiments
Achieves segmentation that respects word and phrase boundaries
Demonstrates promising results with binary input and minimal initial concepts
Abstract
We present a system for bottom-up cumulative learning of myriad concepts corresponding to meaningful character strings, and their part-related and prediction edges. The learning is self-supervised in that the concepts discovered are used as predictors as well as targets of prediction. We devise an objective for segmenting with the learned concepts, derived from comparing to a baseline prediction system, that promotes making and using larger concepts, which in turn allows for predicting larger spans of text, and we describe a simple technique to promote exploration, i.e. trying out newly generated concepts in the segmentation process. We motivate and explain a layering of the concepts, to help separate the (conditional) distributions learnt among concepts. The layering of the concepts roughly corresponds to a part-whole concept hierarchy. With rudimentary segmentation and learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
