Efficient Induction of Language Models Via Probabilistic Concept   Formation

Christopher J. MacLellan; Peter Matsakis; Pat Langley

arXiv:2212.11937·cs.CL·December 23, 2022·1 cites

Efficient Induction of Language Models Via Probabilistic Concept Formation

Christopher J. MacLellan, Peter Matsakis, Pat Langley

PDF

Open Access

TL;DR

This paper introduces three extensions to the Cobweb system for incremental, probabilistic language model induction, enabling online processing of sequential language data and improving synonym grouping and homonym separation.

Contribution

The paper develops Word, Leaf, and Path variants of Cobweb that encode language context and update hierarchies incrementally, adapting a taxonomic approach to sequential language learning.

Findings

01

Effective synonym grouping demonstrated

02

Homonyms are kept apart successfully

03

Training efficiency is improved

Abstract

This paper presents a novel approach to the acquisition of language models from corpora. The framework builds on Cobweb, an early system for constructing taxonomic hierarchies of probabilistic concepts that used a tabular, attribute-value encoding of training cases and concepts, making it unsuitable for sequential input like language. In response, we explore three new extensions to Cobweb -- the Word, Leaf, and Path variants. These systems encode each training case as an anchor word and surrounding context words, and they store probabilistic descriptions of concepts as distributions over anchor and context information. As in the original Cobweb, a performance element sorts a new instance downward through the hierarchy and uses the final node to predict missing features. Learning is interleaved with performance, updating concept probabilities and hierarchy structure as classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques