On the Computation of Distances for Probabilistic Context-Free Grammars
Colin de la Higuera, James Scicluna, Mark-Jan Nederhof

TL;DR
This paper investigates the computational complexity of comparing probabilistic context-free grammars, showing that many distance measures are undecidable, but some specific computations like most probable string are feasible.
Contribution
It proves that computing distances like L1, L2, variation, and KL divergence for PCFGs is undecidable, while also identifying cases where computation is possible.
Findings
Computing certain distances for PCFGs is undecidable.
The most probable string for a PCFG can be computed.
Chebyshev distance is interreducible with language equivalence.
Abstract
Probabilistic context-free grammars (PCFGs) are used to define distributions over strings, and are powerful modelling tools in a number of areas, including natural language processing, software engineering, model checking, bio-informatics, and pattern recognition. A common important question is that of comparing the distributions generated or modelled by these grammars: this is done through checking language equivalence and computing distances. Two PCFGs are language equivalent if every string has identical probability with both grammars. This also means that the distance (whichever norm is used) is null. It is known that the language equivalence problem is interreducible with that of multiple ambiguity for context-free grammars, a long-standing open question. In this work, we prove that computing distances corresponds to solving undecidable questions: this is the case for the L1, L2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Machine Learning and Algorithms · Algorithms and Data Compression
