TL;DR
FastKASSIM introduces a fast, robust syntactic similarity metric based on tree kernels, improving efficiency and consistency over previous methods for analyzing document and utterance-level syntax.
Contribution
It presents a novel tree kernel-based metric that is faster and more robust for measuring syntactic similarity at multiple levels, addressing limitations of existing metrics.
Findings
Syntactically similar arguments are more persuasive in online debates.
Syntax can predict authorship in legal documents.
FastKASSIM is up to 5.32 times faster than previous methods.
Abstract
Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance- and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance- and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels. FastKASSIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than its predecessor over documents in the r/ChangeMyView corpus. FastKASSIM's improvements allow us to examine hypotheses in two settings with large documents. We find that syntactically similar arguments on r/ChangeMyView tend to be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
