The risks of mixing dependency lengths from sequences of different length
Ramon Ferrer-i-Cancho, Haitao Liu

TL;DR
Mixing dependency lengths from sentences of different lengths can lead to misleading conclusions, as the distribution of dependency lengths varies with sentence length and affects cross-language comparisons.
Contribution
This paper demonstrates that dependency length distributions depend on sentence length and that mixing lengths across different sentence lengths can produce misleading results.
Findings
Dependency length distributions differ by sentence length.
Mixing lengths from different sentence lengths can distort analysis.
Differences in average dependency lengths across languages may be due to sentence length distributions.
Abstract
Mixing dependency lengths from sequences of different length is a common practice in language research. However, the empirical distribution of dependency lengths of sentences of the same length differs from that of sentences of varying length and the distribution of dependency lengths depends on sentence length for real sentences and also under the null hypothesis that dependencies connect vertices located in random positions of the sequence. This suggests that certain results, such as the distribution of syntactic dependency lengths mixing dependencies from sentences of varying length, could be a mere consequence of that mixing. Furthermore, differences in the global averages of dependency length (mixing lengths from sentences of varying length) for two different languages do not simply imply a priori that one language optimizes dependency lengths better than the other because those…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
