Identity Testing for Stochastic Languages
Smayan Agarwal, Shobhit Singh, and Aalok Thakkar

TL;DR
This paper introduces the first framework for identity testing of stochastic languages over infinite structures, combining formal language theory with distribution property testing and providing efficient algorithms with theoretical guarantees.
Contribution
It develops a polynomial-time algorithm for verifying stochastic language representations and a truncation-based identity testing method for infinite distributions.
Findings
Polynomial-time algorithm for stochastic language verification
Truncation-based identity testing with near-optimal sample complexity
Establishes the first identity testing framework for infinite discrete distributions
Abstract
Determining whether an unknown distribution matches a known reference is a cornerstone problem in distributional analysis. While classical results establish a rigorous framework in the case of distributions over finite domains, real-world applications in computational linguistics, bioinformatics, and program analysis demand testing over infinite combinatorial structures, particularly strings. In this paper, we initiate the theoretical study of identity testing for stochastic languages, bridging formal language theory with modern distribution property testing. We first propose a polynomial-time algorithm to verify if a finite state machine represents a stochastic language, and then prove that rational stochastic languages can approximate an arbitrary probability distribution. Building on these representations, we develop a truncation-based identity testing algorithm that distinguishes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
