# CN-Probase: A Data-driven Approach for Large-scale Chinese Taxonomy   Construction

**Authors:** Jindong Chen, Ao Wang, Jiangjie Chen, Yanghua Xiao, Zhendong Chu,, Jingping Liu, Jiaqing Liang, Wei Wang

arXiv: 1902.10326 · 2019-02-28

## TL;DR

This paper presents CN-Probase, a large-scale, high-precision Chinese taxonomy constructed through data-driven extraction and verification methods, significantly advancing Chinese taxonomy resources and demonstrating practical deployment at Alibaba.

## Contribution

It introduces a novel framework for automatic Chinese taxonomy construction combining extraction from encyclopedias and heuristic verification, achieving the largest high-quality Chinese taxonomy.

## Key findings

- Constructed CN-Probase with 95% precision
- Deployed on Aliyun with over 82 million API calls
- Achieved the largest Chinese taxonomy to date

## Abstract

Taxonomies play an important role in machine intelligence. However, most well-known taxonomies are in English, and non-English taxonomies, especially Chinese ones, are still very rare. In this paper, we focus on automatic Chinese taxonomy construction and propose an effective generation and verification framework to build a large-scale and high-quality Chinese taxonomy. In the generation module, we extract isA relations from multiple sources of Chinese encyclopedia, which ensures the coverage. To further improve the precision of taxonomy, we apply three heuristic approaches in verification module. As a result, we construct the largest Chinese taxonomy with high precision about 95% called CN-Probase. Our taxonomy has been deployed on Aliyun, with over 82 million API calls in six months.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.10326/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1902.10326/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/1902.10326/full.md

---
Source: https://tomesphere.com/paper/1902.10326