Quantifying syntax similarity with a polynomial representation of dependency trees
Pengyu Liu, Tinghao Feng, Rui Liu

TL;DR
This paper introduces a polynomial-based graph method to accurately quantify and compare syntactic structures in dependency trees, enabling cross-linguistic analysis and diversity measurement.
Contribution
It presents a novel polynomial representation for dependency trees that captures detailed syntactic information and facilitates syntax similarity and diversity analysis.
Findings
Effective differentiation of tree structures using the polynomial method
Application to multilingual datasets reveals syntactic similarities and differences
Potential for measuring syntax diversity across corpora
Abstract
We introduce a graph polynomial that distinguishes tree structures to represent dependency grammar and a measure based on the polynomial representation to quantify syntax similarity. The polynomial encodes accurate and comprehensive information about the dependency structure and dependency relations of words in a sentence. We apply the polynomial-based methods to analyze sentences in the Parallel Universal Dependencies treebanks. Specifically, we compare the syntax of sentences and their translations in different languages, and we perform a syntactic typology study of available languages in the Parallel Universal Dependencies treebanks. We also demonstrate and discuss the potential of the methods in measuring syntax diversity of corpora.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
