Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis
P. Cimiano, A. Hotho, S. Staab

TL;DR
This paper introduces a novel method using Formal Concept Analysis to automatically generate concept hierarchies from text corpora, leveraging syntactic dependency vectors and evaluating against handcrafted taxonomies.
Contribution
It presents a new FCA-based approach for taxonomy induction from text, incorporating syntactic dependencies and comparing with clustering methods.
Findings
FCA effectively produces meaningful concept hierarchies.
The approach outperforms hierarchical clustering methods.
Different weighting and smoothing techniques impact hierarchy quality.
Abstract
We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
