Every child should have parents: a taxonomy refinement algorithm based   on hyperbolic term embeddings

Rami Aly; Shantanu Acharya; Alexander Ossa; Arne K\"ohn; Chris; Biemann; and Alexander Panchenko

arXiv:1906.02002·cs.CL·June 6, 2019

Every child should have parents: a taxonomy refinement algorithm based on hyperbolic term embeddings

Rami Aly, Shantanu Acharya, Alexander Ossa, Arne K\"ohn, Chris, Biemann, and Alexander Panchenko

PDF

1 Repo

TL;DR

This paper presents a novel taxonomy refinement algorithm using Poincaré embeddings, significantly enhancing hierarchical taxonomy induction from text by better capturing semantic relationships than Euclidean embeddings.

Contribution

It introduces Poincaré embeddings for taxonomy refinement, improving the accuracy of hierarchical term placement and attachment in taxonomy induction tasks.

Findings

01

Outperforms previous state-of-the-art on SemEval-2016 Task 13

02

Poincaré embeddings better capture hierarchical relationships than Euclidean embeddings

03

Enhances taxonomy accuracy by relocating and attaching terms more effectively

Abstract

We introduce the use of Poincar\'e embeddings to improve existing state-of-the-art approaches to domain-specific taxonomy induction from text as a signal for both relocating wrong hyponym terms within a (pre-induced) taxonomy as well as for attaching disconnected terms in a taxonomy. This method substantially improves previous state-of-the-art results on the SemEval-2016 Task 13 on taxonomy extraction. We demonstrate the superiority of Poincar\'e embeddings over distributional semantic representations, supporting the hypothesis that they can better capture hierarchical lexical-semantic relationships than embeddings in the Euclidean space.

Tables3

Table 1. Table 1: Example words with respective parent(s) in the input taxonomy and after refinement using our domain-specfic Poincaré embeddings, as well as the word’s closest three neighbors (incl. orphans) in embeddings.

Word	Parent in TAXI	Parent after refinement	Gold parent	Closest neighbors
second language acquisition	—	linguistics	linguistics	applied linguistics, semantics, linguistics
botany	—	genetics	plant science, ecology	genetics, evolutionary ecology, animal science
sweet potatoes	—	vegetables	vegetables	vegetables, side dishes, fruit
wastewater	water	waste	waste	marine pollution, waste, pollutant
water	waste, natural resources	natural resources	aquatic environment	continental shelf, management of resources
international relations	sociology, analysis, humanities	humanities	political science	economics, economic theory, geography

Table 2. Table 2: Number of attached orphans in taxonomies created by TAXI using different embeddings.

Domain	word2vec	P. WordNet	P. domain-specific	# orphans
Environment	25	18	34	113
Science	56	39	48	158
Food	347	181	267	775

Table 3. Table 3: F 1 comparison between original (TAXI) and refined taxonomy using domain-specific embeddings.

Language	Domain	Original	Refined	# rel. data	# rel. gold
English	Environment	26.9	30.9	657	261
	Science	36.7	41.4	451	465
	Food	27.9	34.1	1898	1587
French	Environment	23.7	28.3	114	266
	Science	31.8	33.1	118	451
	Food	22.4	28.9	598	1441
Italian	Environment	31.0	30.8	2	266
	Science	32.0	34.2	4	444
	Food	16.9	18.5	57	1304
Dutch	Environment	28.4	27.1	7	267
	Science	29.8	30.5	15	449
	Food	19.4	21.8	61	1446

Equations2

d(\textbf{u},\textbf{v})=\textrm{arcosh}\Big{(}1+2\frac{||\textbf{u}-\textbf{v}||^{2}}{(1-||\textbf{u}||^{2})(1-||\textbf{v}||^{2})}\Big{)}.

d(\textbf{u},\textbf{v})=\textrm{arcosh}\Big{(}1+2\frac{||\textbf{u}-\textbf{v}||^{2}}{(1-||\textbf{u}||^{2})(1-||\textbf{v}||^{2})}\Big{)}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uhh-lt/Taxonomy_Refinement_Embeddings
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Every child should have parents: a taxonomy refinement algorithm

based on hyperbolic term embeddings

Rami Aly