Tree Edit Distance Learning via Adaptive Symbol Embeddings: Supplementary Materials and Results
Benjamin Paa{\ss}en

TL;DR
This paper introduces a novel metric learning method for trees that learns node embeddings to improve classification, outperforming existing approaches across diverse datasets including biomedical and natural language data.
Contribution
It proposes a new approach to learn tree edit distances indirectly through node embeddings, ensuring metric properties and better interpretability.
Findings
Outperforms state-of-the-art on six benchmark datasets
Effective across diverse domains including computer science and biomedical data
Scales to large datasets with over 300,000 nodes
Abstract
Metric learning has the aim to improve classification accuracy by learning a distance measure which brings data points from the same class closer together and pushes data points from different classes further apart. Recent research has demonstrated that metric learning approaches can also be applied to trees, such as molecular structures, abstract syntax trees of computer programs, or syntax trees of natural language, by learning the cost function of an edit distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree. However, learning such costs directly may yield an edit distance which violates metric axioms, is challenging to interpret, and may not generalize well. In this contribution, we propose a novel metric learning approach for trees which learns an edit distance indirectly by embedding the tree nodes as vectors, such that the Euclidean distance between those…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Artificial Intelligence in Healthcare · Data Mining Algorithms and Applications
