Identity Testing for Stochastic Languages

Smayan Agarwal; Shobhit Singh; and Aalok Thakkar

arXiv:2508.03826·cs.FL·August 7, 2025

Identity Testing for Stochastic Languages

Smayan Agarwal, Shobhit Singh, and Aalok Thakkar

PDF

TL;DR

This paper introduces the first framework for identity testing of stochastic languages over infinite structures, combining formal language theory with distribution property testing and providing efficient algorithms with theoretical guarantees.

Contribution

It develops a polynomial-time algorithm for verifying stochastic language representations and a truncation-based identity testing method for infinite distributions.

Findings

01

Polynomial-time algorithm for stochastic language verification

02

Truncation-based identity testing with near-optimal sample complexity

03

Establishes the first identity testing framework for infinite discrete distributions

Abstract

Determining whether an unknown distribution matches a known reference is a cornerstone problem in distributional analysis. While classical results establish a rigorous framework in the case of distributions over finite domains, real-world applications in computational linguistics, bioinformatics, and program analysis demand testing over infinite combinatorial structures, particularly strings. In this paper, we initiate the theoretical study of identity testing for stochastic languages, bridging formal language theory with modern distribution property testing. We first propose a polynomial-time algorithm to verify if a finite state machine represents a stochastic language, and then prove that rational stochastic languages can approximate an arbitrary probability distribution. Building on these representations, we develop a truncation-based identity testing algorithm that distinguishes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.